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PREFACE 


This  report  was  prepared  as  part  of  Rand's  DoD  Training  and  Man- 
power Management  Program,  sponsored  by  the  Human  Resources  Research 
Office  of  the  Defense  Advanced  Research  Projects  Agency  (ARPA).  With 
manpower  issues  assuming  an  even  greater  importance  in  defense  plan- 
ning and  budgeting,  it  is  the  purpose  of  this  research  program  to  de- 
velop broad  strategies  and  specific  solutions  for  dealing  with  present 
and  future  military  manpower  problems.  This  Includes  the  development 
of  new  research  methodologies  for  examining  broad  classes  of  manpower 
problems,  as  well  as  specific  problem-oriented  research.  In  addition 
to  providing  analysis  of  current  and  future  manpower  issues,  it  is 
hoped  that  this  research  program  will  contribute  to  a better  general 

V understanding  of  the  manpower  problems  confronting  the  Department  of 

$ 

* Defense. 

| We  believe  decision-theoretic  psychometrics  holds  considerable 

I promise  for  military  selection,  training,  and  other  applications.  In 

the  past,  use  of  this  technique  has  been  hampered  by  the  need  to  orient 

;■  people  to  a new  way  of  answering  questions,  and  the  need  to  process 

the  much  greater  amount  of  information  the  method  yields. 

Because  computers  now  offer  a reasonable  and,  in  many  cases,  a 
cost-attractive  solution  to  these  problems,  we  have  devised  programs 
and  procedu.es  for  the  on-line  administration  of  tests  according  to 
the  requirements  of  decision-theoretic  psychometrics.  At  this  time, 
these  programs  are  running  on  certain  IBM  360/370  computer  systems 
with  graphic  capability,  on  the  IMLAC  PDS-1  "smart  terminal"  computer, 
and  on  the  PLATO  TV  system. 

This  report  provides  the  rationale  for  these  applications,  and 
thus  should  be  of  interest  to  potential  users  and  adapters  of  these 
programs,  as  well  as  to  educators  interested  in  examining  in  depth 
the  implications  of  this  new  methodology. 


SUMMARY 


A student's  choice  of  an  answer  to  a test  question  is  a coarse 
measure  of  his  knowledge  about  the  subject  matter  of  the  question. 

Much  finer  measurement  might  be  achieved  if  the  student  were  asked  to 
estimate,  for  each  possible  answer,  the  probability  that  it  is  the 
correct  one.  Such  a procedure  could  yield  two  classes  of  benefits: 

(a)  students  could  learn  the  language  of  numerical  probability  and  use 
it  to  communicate  uncertainty,  and  (b)  the  learning  of  other  subjects 
could  be  facilitated. 

This  report  describes  the  rationale  underlying  a procedure  for 
eliciting  personal  estimates  of  probabilities  utilizing  a proper  scor- 
ing rule,  and  Illustrates  some  new  techniques  for  calibrating  those 
probabilities  and  providing  better  feedback  to  students  learning  to 
assess  uncertainty.  In  addition,  ner?  results  are  presented  comparing 
the  incentive  for  study,  rehearsal,  and  practice  provided  by  the  proper 
scoring  rule  with  that  provided  by  the  simple  choice  procedure,  and 
concerning  the  potential  effect  of  cutoff  scores  and  prizes  upon  stu- 
dent behavior. 

A companion  report  describes  an  interactive  computer  program  in- 
corporating these  procedures.  See  W.  L.  Sibley,  A Prototype  Computer 
Program  for  Interactive  Computer  Administered  Admissible  Probability 
Measurement,  R-1258-ARPA,  April  1974. 
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RATIOHALE  OF  COMPUTER-ADMINISTERED  ADMISSIBLE 
PROBABILITY  MEASUREMENT 


1.  ELICITATION  OF  PERSONAL  PROBABILITIES  IN  EDUCATION 

Along  with  the  recent  growth  of  the  theory  and  application  of  the 
mathematics  of  decisionmaking  has  cone  an  increased  interest  In  expres- 
sing uncertainty  in  terms  of  personal  probabilities.  Most  of  the  atten- 
tion in  this  area  has  been  focused  upon  eliciting  personal  probabilities 
from  decislonnakers  and  experts  to  guide  policy  decisions  [1-3].  How- 
ever, at  the  end  of  his  comprehensive  and  excellent  review  of  this  area. 
Savage  [3]  refers  to  potential  educational  applications  of  these  tech- 
niques and  states: 


Proper  scoring  rules  hold  forth  promise  as  more  sophisti- 
cated ways  of  administering  multiple-choice  tests  in  certain 
educational  situations.  The  student  is  invited  not  merely 
to  choose  one  [answer]  (or  possibly  none)  but  to  show  in 
some  way  how  his  opinion  is  distributed  over  the  [answers], 
subject  to  a proper  scoring  rule  or  a rough  facsimile  thereof. 

Though  requiring  more  student  time  per  item,  these  methods 
should  result  in  more  discrimination  per  item  than  ordinary 
multiple-choice  tests,  with  a possible  net  gain.  Also,  they 
seem  to  open  a wealth  of  opportunities  for  the  educational 
experimenter. 

Above  all,  the  educational  advantage  of  training  people — 
possibly  beginning  in  early  childhood — to  assay  the  strengths 
of  their  own  opinions  and  to  meet  risk  with  judgment  seems 
Inestimable.  The  usual  tests  and  the  language  habits  of  our 
culture  tend  to  promote  confusion  between  certainty  and  be- 
lief. They  encourage  both  the  v*ce  of  acting  and  speaking 
as  though  we  were  certain  when  we  are  only  fairly  sure  and 
that  of  acting  and  speaking  as  though  the  opinions  we  do 
have  were  worthless  when  they  are  not  very  strong. 

Effects  of  nonlinearity  in  educational  testing^  deserve  some 
thought,  but  presumably  nonlinearity  Is  not  a severe  threat 
when  a test  consists  of  a large  number  of  items.  One  source 
of  nonlinearity  that  has  been  pointed  out  to  me  is  this  A 


Described  and  discussed  in  Sec.  5.3. 
^These  effects  are  discussed  in  Sec.  12. 


student  competing  with  others  for  a single  prize  is  motivated 
to  respond  so  as  to  maximize  the  probability  that  his  score 
will  be  the  highest  of  all.  This  need  not  be  consistent  with 
maximizing  his  expected  score,  and  presumably  situations 
could  be  devised  in  which  the  difference  would  be  Important. 

This  brief  statement  characterizes  both  the  promises  and  the  problems 
of  eliciting  personal  probabilities  from  students.  The  promises  come 
from  two  educational  goals  that  might  be  served  by  this  application: 

1.  As  J subject  matter  and  skill  that  is  valued  in  and  of  Itself. 
For  example,  it  is  important  for  students  to  learn  to  dis- 
criminate degrees  of  uncertainty  and  to  be  able  to  communi- 
cate uncertainty  using  the  language  of  numerical  probability. 

2.  As  a means  of  facilitating  the  learning  of  other  subject 
matter,  e.g.,  by  providing  more  Information  about  a student's 
state  of  knowledge. 

The  problems  reside  largely  in  two  areas: 

1.  Students  must  be  taught  a new  way  of  answering  questions  and 
they  must  overcome  bad  habits  and  inappropriate  sets  induced 
by  their  prior  test-taking  experience. 

2.  Great  care  must  be  exercised  in  insuring  that  the  incentive 
structure  impacting  on  the  student  does  in  fact  correspond 
to  that  assumed  in  the  decision-theoretic  derivation  of  the 
method,  i.e.,  the  student  must  be  motivated  to  attempt  to 
maximize  his  expected  score,  rather  than  maximize  the  proba- 
bility of  exceeding  some  standard  or  surpassing  his  class- 
mates. This  is  a subtle  point  we  discuss  at  greater  length 
in  Sec.  12  below. 

The  purpose  of  this  report  is  to  describe  the  rationale  underly- 
ing a procedure  for  eliciting  personal  estimates  of  probabilities  util- 
izing a proper  scoring  rule,  and  to  illustrate  some  new  techniques  for 
calibrating  personal  probabilities  and  providing  better  feedback  to 


students  learning  to  assess  uncertainty.  In  addition,  new  results  are 
presented  comparing  the  incentive  for  study,  rehearsal,  and  practice 
provided  by  the  proper  scoring  rule  with  that  provided  by  the  simple 
choice  procedure,  and  concerning  the  potential  effect  of  cutoff  scores 
upon  student  behavior. 

A companion  report  [4]  describes  an  experimental  version  of  an 
interactive  computer  program  incorporating  these  procedures  and  focuses 
upon  the  first  problem  mentioned  above. 

2.  THE  CONTEXT  OF  TESTING 

Students  are  asked  a series  of  questions  to  ascertain  their  know- 
ledge of  the  subject  matter  represented  by  the  questions.  A test  item 
is  composed  of  a question  and  a list  of  k (k  » 2,  3, ,...)  possible 
answers,  one  and  only  one  of  which  is  correct.  A "test"  is  composed 
of  n of  these  items,  usually  answered  in  sequence,  and  where  n typi- 
cally has  a value  between  10  and  100. 


KNOWLEDGE  AS  A PROBABILITY  DISTRIBUTION 
While  a person  holding  the  answer  key  is  not  at  all  uncertain 
about  which  answer  to  a question  is  designated  "correct,"  a student 
may  encounter  a certain  amount  of  uncertainty.  In  information- 
theoretic  terms  [5],  that  amount  is 


U » - l p log,  p , 
i-1  1 * 1 


where  p^  is  the  likelihood  (according  to  the  student's  view  of  the 
situation)  of  the  event,  "Answer  i is  the  correct  answer."  Because 
the  p1’s  may  be  viewed  as  probabilities  of  mutually  exclusive  and 
collectively  exhaustive  events,  we  have 
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The  uncertainty  measure,  called  "entropy"  by  information  theorists, 
achieves  Its  maximum  value  (log2  k)  when  all  the  p^'s  ere  equal  and 
achieves  its  minimum  val  c (zero)  when  one  p^  is  unity  and  the  rest 
are  zero. 

There  may  be  several  sources  of  this  uncertainty.  Some  examples 
are:  the  student  may  not  be  familiar  with  the  standards  and  values  of 

the  writer  of  the  test  item;  the  student  may  not  comprehend  all  of  the 
language  used  in  the  cest  item;  most  important,  he  may  not  know  enough 
facts  and  reasons  to  arrive  unequivocally  at  the  correct  answer. 

The  uncertainty  measure  itself  is  unsatisfactory  as  a measure  of 
useful  knowledge,  because  it  is  syanetrlc  or  nondirectlonal  with  re- 
spect to  the  answers.  According  to  this  measure,  a student  would  have 
minimal  uncertainty  (and  maximal  information)  whenever  one  of  the  p^'a 
equals  one.  A student  holding  a probability  of  one  for  an  inoorveat 
answer  possesses  just  as  much  information  (in  his  own  view)  as  does 
another  student  holding  a probability  of  one  for  the  eorveot  answer. 
Uncertainty  can  serve  as  a measure  of  learning,  but  education  and 
training  is  concerned  with  what  Is  learned  and  must  focus  -on  the  prob- 
ability associated  with  the  correct  answer.  Before  a student  is  ex- 
posed to  a subject  matter  and  tries  to  learn  It,  he  might  be  expected 
to  be  uncertain  about  answers  to  questions.  If  a question  has  three 
answers,  the  student's  probability  associated  with  the  correct  answer 
might  fluctuate  over  time  but  remain  close  to  the  value  of  1/3  corre- 
sponding to  maximal  uncertainty,  as  shown  by  the  first  segment  of  the 
curve  in  Fig.  1. 

When  the  student  beglnB  to  take  an  active  interest  in  learning  the 
subject  matter,  the  probability  might  be  expected  to  rise  and  bwgln  to 
approach  one  as  the  student  achieves  greater  and  greater  mastery  of  the 
subject  matter.  The  student's  probability  associated  with  the  correct 
answer  when  measured  over  time  might  trace  a path  similar  to  the  learn- 
ing curve  shown  in  Fig.  1.  Upon  completion  of  the  learning  phase  and 
if  the  student's  knowledge  or  skill  is  not  reinforced,  the  probability 
might  begin  to  decline  toward  1/3  and  trace  a forgetting  curve  such  as 
that  shown  in  Fig.  1. 
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Fig.  1 — Hypothetical  trace  of  probability  ovei  time 


While  these  hypothetical  curves  resemble  those  found  In  the  psy- 
chology of  learning.  It  should  be  remembered  that  much  of  the  experi- 
mental data  In  this  area  are  reported  in  terms  of  averages  over  either 
subjects,  trials,  or  both.  Such  Indirect  measures  must  be  used  because 
of  the  discrete  nature  of  the  responses  made  available  to  the  subjects. 
If  It  were  possible  to  take  direct  and  repeated  measurements  of  a sub- 
ject's personal  probabilities,  the  need  for  aggregation  of  data  would 
be  greatly  reduced  and  the  results  of  experiments  might  appear  quite 
different. 


4.  THE  EFFECT  0?  LIMITING  RESPONSE  OPTIONS 

In  the  true-false  and  multiple-choice  methods  of  test  administra- 
tion, a student  Is  required  to  select  one  and  only  one  of  the  answers 


The  major  exception,  response  latency,  is  a measure  continuous  in 
the  time  dimension.  Sven  so,  It  is  frequently  averaged  because  of  its 
Instability  and,  while  possibly  reflecting  uncertainty,  it  fails  to  con- 
vey the  directional  Information  contained  In  the  distribution  of  per- 
sonal probabilities. 


to  the  test  question.  Thus,  for  true-false  and  two-alternative  multiple- 
choice  items , the  student’s  response  is  constrained  to  only  two  possible 
values;  for  three-alternative  multiple-choice  items,  the  student’s  re- 
sponse is  constrained  to  only  three  possible  values;  and  so  on.  If  the 
student's  state  of  knowledge  and  degree  of  uncertainty  with  respect  to 
the  question  actually  can  assume  more  than  k different  values,  it  is 
clearly  impossible  to  have  each  different  response  uniquely  associated 
with  a state  of  knowledge.  The  student  would  have  to  use  the  same  re- 
sponse for  several  different  states  of  knowledge  and  the  restricted 
response  set  of  the  choice  method  would  act  as  a filter  inserted  in  the 
communication  channel  between  student  and  teacher  or  experimenter.  The 
observer  of  the  test  behavior  could  not  use  the  student's  response  to 
recover  unequlvocably  the  state  of  knowledge  that  led  to  the  response. 

This  limitation  can  be  removed  only  by  increasing  the  number  of 
response  options  available  to  the  student.  To  eliminate  the  filtering 
action  described  above,  the  number  of  response  options  must  be  greater 
than  or  equal  to  the  different  states  of  knowledge  the  student  may 
possess.  Because  different  students  and  the  same  student  at  different 
times  may  experience  a varying  number  of  states  of  knowledge  and  because 
these  numbers  are  unknown,  the  safest  way  of  preventing  filtering  appears 
to  be  to  allow  a very  large  number  of  response  options. 

A mathematically  and  graphically  convenient  way  of  doing  this  is  to 
allow  the  student  to  assign  a weight  from  the  real  number  system  to  each 
of  the  possible  answers  to  the  test  question.  For  reasons  which  will 
become  apparent,  let  the  student's  response  be  a vector  R ■ (r^,  r^, 

. . . , r^)  where 

k 

OaCr.  sCl,  Y “ 1 » and  k & 2 . 

1 i-1  1 

Thus,  for  two-answer  questions  the  student's  response  corresponds  to 
selecting  a point  on  the  line  segment  [0,1]  while  for  three-answer 
questions  the  response  corresponds  to  selecting  a point  in  an  equi- 
lateral triangle  sb  shown  in  Fig.  2.  Questions  with  four  jossible 


Merely  allowing  a student  more  response  options  does  not  insure 
that  aore  information  about  his  states  of  knowledge  will  actually  be 
transmitted . The  student  might,  for  example , exercise  only  a minimal 
number  of  the  options  or,  for  another  example,  the  way  he  associates 
the  response  option  with  his  probabilities  might  be  inconsistent  or 
arbitrary.  In  either  event,  the  amount  of  information  actually  trans- 
mitted may  be  greatly  reduced. 

A student’s  state  of  knowledge,  i.e.,  the  facts  recalled,  reason- 
ing, and  other  thought  processes  leading  to  a probability  distribution 
over  the  possible  answers,  are  directly  observable  only  by  the  student 
himself.  The  student's  responses  are,  of  course,  directly  observable 
by  others,  but  there  is  no  biological  law  that  a student's  responses 
must  reflect  his  probabilities.  It  is,  in  other  words,  a matter  of 
free  trill  and  volition  on  the  part  of  the  student  as  to  how  he  asso- 
ciates his  response  with  his  probabilities. 

In  a situation  such  as  this,  the  best  that  can  be  done  is  to  struc- 
ture the  task  given  the  student  so  that  he  is  rewarded  for  consistently 
and  accurately  associating  response  with  his  probabilities.  Although 
the  association  is  one-to-many,  this  is  implicitly  done  with  the  simple 
choice  method  of  responding  used  in  the  administration  of  achievement, 
aptitude,  and  ability  tests. 


S.l  Simple  Choice  Testing 

To  see  this,  suppose  a student  wishes  to  maximize  his  expected  test 

score.  With  the  most  frequently  used  simple  choice  scoring  system,  he 

earns  one  point  for  each  correct  answer  selected  and  no  points  for  an 
* 

incorrect  answer.  Because  his  test  score  is  simply  the  sum  of  his  item 


It  can  be  assumed  without  loss  of  generality  that  the  student  re- 
ceives no  points  if  he  omits  an  item.  Thus,  the  loss  of  a fraction  of 
a point  as  illustrated  by  use  of  the  "correction  for  guessing"  formula 


scored,  his  expected  test  score  can  be  maximized  by  maximizing  the 
expectation  for  each  item  score.  Thus,  for  any  item  on  the  test,  the 
decision  problem  faced  by  the  student  is  as  shown  in  Table  1;  and  his 
optimal  strategy  is  to  choose  that  course  of  action  or  response  asso- 
ciated with  the  maximum  expected  score  as  defined  by  the  information 
available  to  him  at  the  time  of  making  the  decision.  This  information 
should  be  reflected  in  the  personal  probability  distributions  as  de- 
fined in  Sec.  3. 

Table  1 

DECISION  PROBLEM  FACED  BY  STUDENT  ANSWERING  ITEM 
UNDER  SIMPLE  CHOICE  METHOD 


Choose  answer  1 
Choose  answer  2 


Choose  answer  k 


Probability 

(That  Answer  May  Be  Correct) 


Correct  Answer 


It  should  be  remembered  that  the  probabilities  characterize  the 
student— not  the  item  question  and  answers.  One  answer  is  correct — 
the  others  are  incorrect.  Two  different  students,  or  the  same  student 
at  different  times,  may  very  well  possess  different  probability  dis- 
tributions over  the  answers  to  the  same  question.  The  probabilities 
reflect  the  information  available  to  the  student  at  the  time  he  must 
make  his  decision,  and  provide  his  only  guide  to  action. 

does  not  change  the  structure  of  the  task.  The  structure  is  changed, 
however,  if  the  penalty  for  selecting  a wrong  answer  is  greater  than 
k - 1,  where  k is  the  number  of  possible  answers  to  the  test  question. 
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For  a student  who  wishes  to  score  well  on  a siaple  choice  test, 
the  optlaal  test-taking  decision  rule  is,  for  each  item,  to  select 
that  answer  he  considers  most  likely  to  be  correct.  If  two  or  more 
answers  are  tied  for  maximum  probability,  it  makes  no  difference  which 
he  selects,  because  the  expected  score  is  the  same.  This  decision  rule 
may  be  displayed  graphically  for  two  and  three  possible  answers  as 
shown  in  Fig.  3. 

This  analysis  makes  it  apparent  that  while  the  simple  choice  pro- 
cedure can  motivate  a student  to  use  a consistent  and  logical  mapping 
of  probabilities  onto  responses,  each  response  represents  a very  broad 
range  of  probabilities.  When  a student  chooses  an  answer,  all  that 
may  be  Inferred  from  this  response  is  that  he  views  no  other  answer 
as  being  more  likely  to  be  correct. 

Terms  such  as  "well  informed,"  'Sals Informed,"  and  "uninformed" 
are  sometimes  used  to  describe  a person's  knowledge  with  respect  to 
some  subject.  These  and  related  terms  can  be  used  to  characterise 
regions  of  the  personal  probability  space,  as  Illustrated  by  the  de- 
composition shown  in  Fig.  4 for  three  possible  answers.  Because  each 
point  on  the  triangle  corresponds  to  a possible  probability  distribu- 
tion over  the  three  answers,  this  classification  groups  distributions 
that  may  have  a similar  Import.  For  example , if  a student  had  no 
reason  for  very  strongly  preferring  any  answer  over  the  others,  his 
probability  distribution  would  be  located  near  the  center  of  the  tri- 
angle and  he  would  be  "uninformed"  with  respect  to  the  item.  Figures 
3 and  4 may  be  compared  to  see  what  information  is  yielded  by  the 
reaponae-to-probablllty  mapping  Induced  by  the  simple  choice  method. 

The  relations  can  be  summarized  as  in  Table  2. 

While  the  simple  choice  response  la  clearly  Incapable  of  discrim- 
inating many  states  of  knowledge,  a free  response  such  as  that  described 
in  Sec.  4 would  have  the  potential  of  transmitting  a great  deal  more 
information  about  a student's  state  of  knowledge.  Will  this  informa- 
tion actually  be  transmitted? 

5.2  Confidence  Testing 

Suppose,  as  before,  that  the  student  wishes  to  maximize  his  ex- 
pected test  score  and  that  he  is  allowed  to  distribute  100  points  over 
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Answer  2 — 


T wo  possible  answers 


Choose 
Answer  2 


Choose 
Answer  I 


Probability  line 


Answer  I 


Three  possible  answers 


Answer  l 


Answer  3 


Probability  triangle 


Fig . 3 —■  Optimal  decision  rules  for  two  and  three  answers 
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Correct 

answer 


" Well  informed" 


" Informed  " 


" Partially 
informed  " 


" Partially 
informed  " 


" Uninformed  " 


" Misinformed  " 


^ " Bcdly 
misinformed  " 


11  Badly 
misinformed 


Incorrect 

answer 


Incorrect 

answer 


Fig.  4 — One  possible  decomposition  of  the  probability  triangle 
to  represent  some  meaningful  categories  of  knowledge 
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Tabic  2 

INFERENCES  THAT  KAY  BE  DRAWN  FROM  THE  SIMPLE  CHOICE  RESPONSE 


If  the  student 
has  selected 

Student  may  be: 

Student  Is  not: 

The  correct  answer 

Well  Informed 
Informed 

Partially  Informed 
Uninformed 

Misinformed 
Badly  informed 

An  Incorrect  answer 

Partially  informed 
Uninformed 
Misinformed 
Badly  Informed 

Informed 
Well  informed 

the  possible  answers  to  each  ltea  as  Is  sometimes  done  In  "confidence 
testing."  This  would  provide  a set  of  responses  fine-grained  enough 
to  transnit  considerably  sore  information  and  It  would  be  quite  slnple 
to  score  the  student  according  to  the  number  of  points  he  allocated  to 
the  correct  answer.  To  be  acre  explicit,  let  m^  be  the  number  of 
points  allocated  on  the  j th  Item  to  the  1 th  answer,  where 

k 

l -i,  - ioo  . 

1-1  13 


where  m^  Is  the  nuaber  of  points  allocated  to  the  correct  answer  to 
ltea  j. 

The  potential  Impact  of  this  scoring  rule  upon  student  behavior 
aay  be  investigated  by  finding,  as  before,  the  optimal  test-taking 
strategy  for  a student  who  wishes  to  maximize  his  expected  test  score. 
Because  his  test  score  la  simply  the  sum  of  his  Item  scores,  his  ex- 
pected test  score  can  be  maximized  by  maximizing  the  expected  score 


0 £ mAj  < 100  and 


Let  the  test  score  be 


a 

A-* 


1 


1 
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for  each  Item.  There  are  now  far  too  many  response  options  to  list  In 
a table,  but  the  expected  score  for  any  allocation  on  a single  question 
(ra^,  m2,  •••)  m^)  may  be  written  as 

\- 

F 

I ttjp  * *•*  ^Ip^*  P2*  Pj^)  “ ^ ^Pj  ^ ^ ®k^k  * 

Si 

i*. 

t It  is  not  too  difficult  to  find  the  optimal  decision  rule,  i.e.,  to 

specify  for  each  probability  distribution  that  response  (allocation  of 

t. 

points)  which  maximizes  the  expected  item  score.  Because  the  labeling 
[■,  of  the  answers  is,  in  a sense,  arbitrary,  we  may  assume  without  loss 

| of  generality  that 


I'  P1  * p2  > p3  * * ’ ‘ > pk  * 

i.e.,  the  answers  can  be  reordered  from  most  likely  to  least  likely  to 
be  correct  in  the  view  of  the  student.  The  decision  problem  is  one  of 
allocating  points  so  as  to  maximize  the  sum  of  products  as  shown  below. 


m^  + m2p2  + . . . + n^Py  . 

The  points  may  be  placed  one  at  a time  because  the  placing  of  a point 
does  not  change  the  structure  of  the  problem.  Allocating  a point  to 
answer  i yields  a return  of  p^  because  only  that  proportion  p^  of  the 
point  will  be  added  to  the  sum.  If  p^  > p2>  then  the  first  point  should 
be  placed  in  the  first  position  In  order  to  yield  the  largest  possible 
return,  p^;  the  second  point  should  also  be  placed  in  the  first  posi- 
tion: and  so  on  for  all  100  points.  If  p^  **  p2>  or  if  p^  ■ p2  - p^, 
and  so  on,  the  points  can  be  distributed  between  these  maximum  proba- 
bilities, but  there  is  nothing  to  be  gained  by  so  doing.  The  optimal 
test-taking  strategy  for  this  scoring  rule  can  be  aunarlzed  as,  "Find 
an  answer  that  is  at  least  as  likely  to  be  correct  as  any  other  and 
allocate  all  100  points  to  this  answer." 

Thus,  this  scoring  rule  induces  a upping  of  response  onto  proba- 
bility that  degenerates  into  the  simple  choice  situation  (see  Fig.  3). 
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Although  many  response  options  are  offered  to  the  student,  he  Is  max- 
imally rewarded  for  placing  all  100  points  on  the  most  likely  answer. 

If  a student  follows  this  best  test-taking  strategy,  his  responses  will 
be  essentially  indistinguishable  from  choice  type  responses  and  no 
additional  information  will  be  transmitted  about  his  states  of  knowl- 
edge. This  example  shows  that  merely  offering  an  increased  number  of 
response  options  does  not  guarantee  that  more  information  will  be 
transmitted. 

5.3  Admissible  Probability  Measurement 

What  is  required  is  a scoring  rule  that  can  motivate  the  student 
to  use  more  of  the  response  options,  each  associated  with  a small  region 
of  probabilities.  In  the  limit,  this  relation  could  be  expressed  as 
r^  - f(pp,  where  f is  a monotone  increasing  function  of  p and  all  of 
the  potentially  available  information  could  be  transmitted.  There  are 
other  cogent  reasons,  however,  for  further  constraining  f to  be  the 
identity  function,  i.e.,  r^  ■ p^. 

With  the  Identity  relation,  the  student's  responses  are  directly 
interpretable  as  probabilities  and  these  numerical  quantities  can  be 
used  in  the  equations  of  probability,  information,  and  decision  theory 
[1].  Students  would  be  learning  to  communicate  degrees  of  uncertainty 
in  a universal  language  of  probabilities.  For  this  reason  and  in  the 
absence  of  any  compelling  reasons  to  do  otherwise,  it  seems  reasonable 
to  require  that  scoring  rules  possess  the  property  that  a student  can 
maximize  his  expected  score  if  and  only  if  his  responses  match  his 
probabilities.  Scoring  rules  satisfying  this  condition  have  variously 
been  called  "proper"  [3],  "reproducing"  [1,6],  and  "admissible"  [6]. 

It  has  been  shown  that  there  exist  an  infinite  number  of  scoring 
rules  that  induce  the  identity  relation  between  response  and  proba- 
bility [1,6].  Only  one,  however,  possesses  the  property  that  the 
score  depends  only  upon  the  response  assigned  to  the  correct  answer, 
and  not  upon  how  the  responses  are  distributed  over  the  other  answers 
when  more  than  two  answers  are  possible  [6].  This  is  the  logarithmic 
scoring  rule,  which  may  be  written  as  ■ A log  rt  + B,  where  A > 0. 
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Notlce  that  log  r^  - -®  when  r^  « 0.  This  means  that  the  loga- 
rithmic scoring  rule  cannot  be  strictly  applied  in  practice,  because 
if  a student  ever  assigned  a response  of  zero  to  a correct  answer  the 
logarithmic  scoring  rule  calls  for  an  infinite  penalty.  However,  by 
restricting  the  range  of  possible  responses  that  a student  may  use, 
so  that  r^  > d (where  d might  be  set  at  0.01  or  some  other  small  value) 
the  need  for  a very  large  penalty  is  avoided,  but  with  the  sacrifice 
of  some  accuracy  in  measuring  very  small  probabilities  [6]. 

For  many  purposes  it  seems  desirable  to  adjust  A and  B so  that 
when  a student  possesses  no  Information  with  respect  to  an  item  (i.e., 
all  p's  are  equal),  his  score  will  be  zero.  This  may  be  accomplished 
by  choosing  a range,  K,  of  possible  scores  and  setting  A “ O.SK  and 
B - O.SK  log  k,  where,  as  before,  k is  the  number  of  possible  answers. 
The  score  that  the  student  will  receive  if  answer  1 is  correct  can  now 
be  written: 


S(rt)  ■ 0.5K  log  kr4  , s 0.01  . 

A range  of  100  points  appears  satisfactory  for  many  applications. 
Figure  5 shows  graphically  the  conditional  scores  for  the  case  of  two 
possible  answers  while  Fig.  6 shows  selected  condltonal  score  triplets 
for  the  case  of  three  possible  answers.  Notice  that  in  the  case  of  two 
alternatives,  the  maximum  score  obtainable  Is  about  15,  while  the  mini- 
mum score  Is  about  -85.  In  the  case  of  three  alternatives,  the  maximum 
la  about  24  and  the  minimum  is  about  -76.  This  difference  in  maximum 
and  minimum  scores  is  caused  by  the  requirement  that  the  scoring  func- 
tion be  zero  when  all  the  responses  are  equal,  but  It  may  also  be  taken 
to  indicate  that  prediction  may  be,  in  some  sense,  easier  with  two 
alternatives  than  with  three. 

Notice,  also,  how  the  penalties  tend  to  be  larger  than  the  rewards. 
This  is  a characteristic  of  all  thi  admissible  scoring  *•  jles  because 
the  nonlinearity  is  required  in  order  to  induce  matching  behavior  in 


For  those  special  situations  requiring  the  accurate  measurement 
of  very  small  probabilities,  d may  be  set  at  a very  much  smaller  value. 
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Fig . 5—  Conditional  score  functions  for  the  case  of  two  possible  answers 
Because  r2  1 — r j , conditional  score  pairs  may  be  found  where  rj"  r. 
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ANSWER  1 

A 

(24,-16,  -IS) 


(«.  -2^76)  (22,  -2 

(19.-U.-T6)  (19,-26.-26)  (19.-TC.-11) 

/ * ^ 


(10, -2.. 76)  (16,-U,  -26)  (16, -26,-11)  (16. -16,  -8} 

/ * * \ 


(13,4,-70)  (la.  -2.-20)  (13.-11,-11)  (13. -26.-2)  (13, -76, 4) 


/ * * * \ 

(».'J, -76)  (9.4.-20)  (9, -2,-11)  (9,-11. -2)  f, -26.4)  (9.-76, 9) 

/ ....  ^ 

«13, -7C)  (4,  1,-11}  (4.-2. -2)  (U,  0. 0)  (4, -11.4)  (4,-26,'J)  (4.-76.  13) 

/ \ 

(-2.16.-1C)  (-2,13,-20)  (-2. 9,  - 11)  (-2.4. -2)  (-2,-2,4)  (-2,-11,  9)  (-2.  -26,13)  (-2,  *16.  16) 

^ • • • • I • y 

(-1),  19,-70)  (-11,  lb,  -2ti)  (-11,13,-11)  (-11, 9. -2)  (-11.4,4)  (-11,  -2,9)  (-11.  11,13)  (- 11,  26,  16)  (-U.2l6.l9) 

< - 26,  22.  -fi)  (26,1', -26)  (-26,  16,  11)  (-26,  11,-2)  ( 26,  1,  4)  ('26.4,')  (-26,-2,  1 1)  ( 20,-11,  16)  (-26,  -26,  t'<)  (-26, -16, 22) 

/•  •••  ••  • • ^ 

(-76,  22  , 20  (-76,  19,  11)  (-76,  16.  2)  ( 10,1-1,4)  ( 76,3,9)  , 76,4,  1 1)  ( 76,-2,  16)  1-76,  11,  19)  ( 76,  26, 22)  1-76.-7 


Fig.  6 — Conditional  score  triplets  (based  on  logarithmic  scoring  function) 
for  some  selected  responses  on  the  equilateral  probability  triangle 
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the  students.  This  characteristic  of  the  scoring  rule  may  have  other 
implications,  as  illustrated  by  this  quotation  from  a Rand  staff  mem- 
ber after  experiencing  computer-assisted  admissible  probability  measure- 
ment as  reported  in  [4]. 

One  thought  that  occurred  to  me  after  I took  (the)  test 
was  that,  contrary  to  other  tests,  this  one  can  also  be 
a learnitij  experience.  The  situation  in  which  one  is 
punished  severely  for  emphatically  stating  what  turns  out 
to  be  wrong,  more  so  than  one  is  rewarded  for  what  is 
right  even  if  emphatically  stated,  is  one  that  is  closer 
to  the  reality  situation  of  everyday  life  than  the  simple 
tests  that  look  only  for  right  or  wrong.  Thus,  the  test 
itself  exercises  a certain  negative  reinforcement  against 
stating  too  strongly  what  one  is  not  really  sure  about, 
and  thus  actually  conditions  a person  to  using  what  knowl- 
edge he  has,  and  at  the  varying  degree  of  certainty  with 
which  he  commands  it,  judiciously.  This  will  be  of  ad- 
vantage to  him  in  life.  For  it  is  a fact  of  life  that  a 
mistake  stated  with  aplomb  permanently  reduces  our  credi- 
bility with  others  who  must  rely  on  our  say-so,  i.e.,  it 
makes  us  less  likely  to  succeed  in  a job,  tor  instance. 

Thus  (the)  test  is  not  only  evaluative  hut  educational. 

Consider  now  the  optimal  test-taking  strategy  for  a student  who  wishes 
to  maximize  expected  test  scores.  As  before  , the  total  test  score  is 
simply  the  sum  of  the  item  scores,  so  expected  test  scores  can  be  maxi- 
mized by  maximizing  each  expected  item  score,  which  may  be  expressed  as 


E{S(r^),  S (rj) , ...»  S(r^)|p^,  P2 , Pj^  1 


k 

- E[S(r) Ip]  - ) p . S ( r , ) 

i-1  1 1 


k 

- J p. (0.5K  log  kr . ) 
i-1  1 1 


k 

3 O.SK(log  k + J p log  r.)  . 

i-1  1 1 
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This  last  form  of  the  equation  makes  clear  a relation  between  the 
logarithmic  scoring  rule  and  Information  theory.  If  a student  responds 
with  - p4  for  all  1,  then  his  expected  item  score  Is  proportional  to 
a constant  plus  the  amount  of  information  he  perceives  that  he  possesses 
with  respect  to  the  item.  This  relation  makes  it  easy  to  derive  Informa- 
tion measures  from  test  scores  based  on  the  logarithmic  scoring  rule 
(cf.  Sec.  8). 

Should  the  student  respond  with  his  probabilities  or,  more  speci- 
fically, how  does  the  logarithmic  scoring  rule  Induce  the  student  to  do 
this  in  order  to  maximize  his  expected  scores?  Figure  7 shows,  for  the 
case  of  two  answers,  expected  scores  for  all  possible  responses  for  each 
of  the  four  different  probability  distributions,  while  Fig.  8 shows  ex- 
pected score  contours  for  the  case  of  three  answers.  Notice  that  for 
each  probability  distribution  the  largest  expected  acore  occurs  where 
the  response  matches  the  probability  distribution.  With  an  admissible 
scoring  rule  such  as  the  logarithmic,  this  is  true  not  only  for  these 
selected  distributions  but  for  all  possible  probability  distributions. 
This  means  that  a student  always  suffers  a loss  lu  expected  score  when- 
ever he  deviates  from  the  optimal  teat-taking  strategy  of  setting 
ri  * pi  f°r  *11  1*  Note  further  that  the  loss  in  expected  score  in- 
creases the  more  he  deviates  from  this  optimal  strategy.  For  those 
instances  in  which  the  student  has  no  knowledge  about  an  item,  i.e. , 
all  the  p's  are  equal,  if  he  pretends  to  have  complete  knowledge  by 
setting  one  of  the  r^  **  1,  he  loses  35  points  in  expected  score  when 
there  are  two  answers  and  almost  43  points  when  there  are  three  answers. 
This  feature  of  the  logarithmic  scoring  rule  may  be  expected  to  ser-e 
as  a disincentive  toward  guesaing-type  behavior.  More  important,  how- 
ever, the  logarithmic  scoring  rule  can  serve  to  induce  an  exact  asso- 
ciation of  responses  with  probabilities.  What  other  Impact  might  it 
have  upon  student  behavior? 


6.  MARSHALING  FACTS  AND  REASONS  BEFORE  RESPONDING 

Up  to  this  point  the  decision  analyses  have  taken  the  student's 
uncertainty  (his  probability  distribution)  aa  given,  and  then  focused 
on  finding  that  response  which  gives  the  highest  possible  expected 
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score.  There  does  come  a time  during  the  taking  of  any  test  when  the 
student  has  to  commit  himself  to  some  response.  The  optimal  strategies 
derived  above  are  appropriate  to  this  problem  and,  thus,  are  designated 
te«t-taking  strategies  in  the  narrow  sense. 

The  scope  of  the  decision  context  must  be  enlarged,  however,  when 
It  Is  considered  that  a student  may  have  some  control  over  his  proba- 
bility distribution  for  an  item.  For  one  example,  while  taking  a test 
he  can  think  more  deeply  about  the  questions  and  answers  to  bring  more 
facts  and  reasons  to  bear  upon  the  problem  at  hand.  For  another  ex- 
ample, prior  to  taking  a test  he  can  study  in  order  to  gain  additional 
information  about  the  subject  matter.  What  implications  does  the  scor- 
ing rule  have  for  these  types  of  behavior  on  the  part  of  a student? 

* 

Given  that  a student  uses  the  optimal  response  strategy,  r , his 

* . * 

optimal  expected  score,  E[S(r  )ipl  * p^S(rj>,  can  be  computed  for 

each  possible  probability  distribution.  Figure  9 shows  this  relation 
when  there  are  two  possible  answers  for  both  the  simple  choice  or  linear 
and  the  logarithmic  scoring  rules.  Notice  that  as  the  student  acquires 
information  to  move  his  probability  away  from  the  state  of  being  un- 
informed (pj  “ p£  * 0.5),  the  optimal  expected  score  from  the  simple 
choice  procedure  increases  in  proportion  to  the  distance  moved  along 
the  probability  scale,  while  that  from  the  logarithmic  scoring  proce- 
dure increases  only  slightly  at  first  and  then  more  and  more  as  higher 
levels  of  mastery  are  achieved.  A similar  effect  is  observed  in  the 
case  of  three  possible  answers,  as  shown  in  Figs.  10  and  11.  Thus,  the 
logarithmic  procedure  requires  a higher  level  of  mastery  to  yield  any 
given  optimal  expected  score  (other  than  zero)  than  does  the  simple 
choice  procedure  and,  in  this  sense,  can  serve  as  a more  stringent  in- 
centive system  for  learning.  In  Sec.  11  we  build  a model  to  investigate 
this  in  more  detail. 

7.  DETECTING  BIAS  IN  THE  ASSIGNMENT  OF  PROBABILITIES 

The  central  theme  so  far  has  been  concerned  with  the  relation  be- 
tween a student’s  responses  and  his  probabilities.  The  probabilities 
were  taken  as  given  and  rl°  relation  (if  any)  between  the  student's 
probabilities  and  tin1  external  world  was  reflected  indirectly  in  the 
student's  actual  test  score. 
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Fig.  9 — Optimal  expected  score  as  a function  of  probability 
in  the  case  of  two  answers 
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Here,  the  focus  will  be  on  the  assessment  of  probabilities  them- 
selves, i.e.,  on  the  relation  between  the  student's  probabilities  and 
the  facts  and  reasons  leading  to  these  probabilities.  It  should  be 
recognized  that  there  is  a point  beyond  which  this  type  of  analysis 
cannot  go.  There  exists  no  completely  general  descriptive  or  prescrip- 
tive theory  of  how  to  derive  probabilities  from  facts  and  reasons. 

Even  if  such  a theory  did  exist,  there  is  at  present  no  way  of  knowing 
what  facts  and  reasons  a student  is  aware  of  at  a particular  moment  in 
time.  Nevertheless,  a number  of  powerful  methods  for  the  assessment 
of  probabilities  are  currently  available  or  under  development. 

The  external  Validity  jraph  is  the  most  fundamental  means  of  cali- 
brating and  operationally  defining  personal  probabilities.  Assume  that 
a student  taking  a test  is  following  the  optimal  test-taking  strategy 
for  the  logarithmic  scoring  rule  so  that  r * p.  Now  let 

N(c|p)  ■ number  of  correct  answers  assigned  probability  p , 


and 


N(l|p)  * number  of  Incorrect  answers  assigned  probability  p . 


Then 


r (p) 


r n.&Lp).  , _ 

N(C|P)  V MCI  Ip) 


is  the  empirical  success  ratio  conditional  upon  the  probability  assign- 
ment p.  A student's  probability  assignments  are  perfectly  valid  if 
R(p)  ■ p for  all  p when  the  number  of  observations  is  increased  without 
limit.  Figure  12  illustrates  an  external  validity  graph. 

An  external  validity  graph  requires  an  inordinate  amount  of  data 
before  a student's  probabilities  can  be  calibr.ited.  However,  by  plac- 
ing some  constraints  on  the  relation  between  relative  frequency  and 
probability,  it  is  possible  to  obtain  some  results  with  much  less  data. 
Suppose,  now,  that  R(p)  tends  to  q = ap  ♦ b , 0 < q < 1 . To  estimate 


□ Proportion 
assigned 
to  correct 
answers 


E3  Proportion 
assigned 
to  incorrect 
answers 


PROBABILITY 


Fig.  12  — An  external  validity  graph  bated  on  28  15-  and  20- 
item  tests  taken  by  one  subject  after  receiving  training 
in  admissible  probability  measurement.  All  tests  were 
composed  of  three  answer  items.  Dashed  line  represents 
ideal  match  between  relative  frquency  and  probability. 

this  linear  realism  function,  let  p^  < P2  < • • • < p^  be  the  level  of 
probability  that  the  student  has  assigned,  and  let 

- number  of  times  p^  has  been  assigned  to  a correct  answer,  and 

- number  of  times  p^  has  been  assigned  to  an  incorrect  answer. 

A convenient  estimation  procedure  is  to  find  a and  b so  as  to  minimize 

L / u,  k2 

X (ui  + vi)  ur+v7-‘pi'b  • 
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The  least  square  estimators  are  (see  [IS]): 


I<ui  + vi>  IVi  " E(ui  + vi)pi  K 


l( + v1)pi  I<ut  + v±)  " fl<ui  + vpPi^ 


2 

- £<ut  * vi)pt  Iuipi  + E(ui  + vi)pi  K . 

£(“t  + v1)pj  I(Ul  + Vi)  - [J(u1  + v1)pi]2 


As  long  as  a reasonably  wide  range  of  p's  Is  used  by  the  student , this 
estimation  procedure  can  yield  fairly  stable  results  with  IS-  and  20- 
itesi  tests,  so  it  represents  a tremendous  improvement  in  efficiency 
over  the  external  validity  graph.  It  should  be  noted  that  If  the  slope 
estimate  a > 1,  the  student  appears  to  be  undervaluing  hls  subject 
nutter  Information,  while  if  a < 1,  the  student  is  apparently  over- 
valuing his  information  (see  Fig.  13).  This  analysis  of  bias  appears 
to  be  completely  satisfactory  for  the  case  of  just  two  possible  answers 
to  each  test  Item.  Where  three  or  sure  answers  are  allowed,  however, 
this  analysis  requires  that  each  response/probability  is  independent  of 
the  others  in  the  distribution.  This  is  not  necessarily  true  for  all 
persons.  For  example,  some  people  might  tend  to  overvalue  information 
when  deducing  reasons  in  favor  of  an  answer,  but  tend  to  undervalue 
Information  when  deducing  reasons  against  an  answer.  In  Appendix  A we 
give  a planar  estimation  procedure  for  the  case  of  three  possible 
answers.  This  procedure  is  capable  of  detecting  the  separate  dimen- 
sions of  bias. 

The  calibration  results  yielded  by  the  realism  function  are  re- 
lated not  only  to  Savage's  conjecture  quoted  at  the  beginning  of  this 
report  but  also  to  a familiar  saying  of  Confucius:  "When  you  know  a 

thing,  to  hold  that  you  know  It  and  when  you  do  not  know  a thing,  to 
acknowledge  that  you  do  not  know  it.  This  is  knowledge." 


8.  PERCEIVED  VERSUS  ACTUAL  INFORMATION 

This  aspect  of  student  behavior  may  be  explored  further  by  com- 
puting (under  the  assumption  of  Independence  among  test  items)  the 
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Fig.  1 3 — Two  realism  functions  based  on  probability  assignments 
for  two  answer  questions.  Person  I undervalues  his  information 
while  person  II  overvalues  his  information 


amount  of  information  the  student  perceives  he  possesses  with  respect 
to  the  subject  matter  of  the  test,  as  Indicated  by  his  probability 
assignment;  i.e., 


n k 

n log  k + l l p log  p 
j-1  1-1  1J 
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expects  and  the  test  score  he  would  expect  if  he  had  no  information 
relevant  to  the  subject  matter. 

The  amount  of  Information  the  student  actually  possesses  with  re- 
spect to  the  subject  matter  of  the  test  may  be  estimated  by  substitut- 
ing p^  ” max[0,  minfl,  ap^  + b)J  for  the  p^  in  .he  above  expression. 
Comparison  of  these  two  information  measures  reflerts  the  extent  and 
direction  of  student  bias.  This  comparison  may  be  made  graphically 
in  terms  of  the  information  square  shown  in  Pig.  14,  which  has  been 
drawn  to  illustrate  the  aptness  here  of  the  Arabian  proverb. 


He  who  knows  and  knows  that  he  knows. 

He  is  wise,  follow  him. 

He  who  knows  and  knows  not  that  he  knows. 

He  is  asleep,  awaken  him. 

He  who  knows  not  and  knows  not  that  he  knows  not. 
He  is  a fool,  shun  him. 

He  who  knows  not  and  knows  that  he  knows  not. 

He  Is  a child,  teach  him. 
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Fifj.  1 /*  The  information  square 


Under  certain  conditions,  however,  the  information  measures  may 
be  equal  but  the  realism  function  reveals  that  the  student  is  tending 
to  overvalue  his  information,  These  instances  tend  to  he  extreme  and 
even  pathological,  e.g.,  when  a student  tries  to  minimize  his  expected 
test  score. 
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At  Rand  ve  have  demonstrated,  and  triad  out,  computer-administered 
decision-theoretic  tasting  with  Many  different  people  using  as  sample 
tests  Reader 'e  Digest  vocabulary  tests;  Humanities,  Natural  Sciences, 
and  Social  Sciences  Items  from  a workbook  for  the  College  Level  Examin- 
ation Program  tests;  and  a midterm  postgraduate-level  test  in  Econo- 
metrics . About  halfway  through  these  demonstrations  we  decided  to  begin 
Reaping  a permanent  record  of  what  people  were  doing  at  the  terminal. 

Figure  15  compares  the  two  Information  measures  for  the  first  test 
token  by  each  of  66  people.  Most  of  the  data  points  fall  below  the 
diagonal.  Indicating  that  most  of  the  "subjects"  at  least  initially 
overvalue  their  knowledge  of  these  subject  matter  areas.  A few  people 
fsll  close  to  the  diagonal,  suggesting  that  some  people  may  exist  who 
con  discriminate  with  great  accuracy  what  they  know  well  from  what  they 
know  less  well. 

What  happens  when  people  taka  more  tests  and,  thus,  gain  more  ex- 
perience with  decision-theoretic  testing?  We  find  that  many  people  can 
reduce  their  score  loss  due  to  lack  of  realism  [4].  I think  that  this 
Improvement  comes  as  they  begin  to  experience  the  consequences  of  the 
admissible  scoring  system  [6]  and  learn  to  reduce  their  rlak-taking 
tendencies  by  making  their  utilities  more  nearly  linear  in  points  earned 
or  lost.  There  does,  however,  appear  to  be  a limit  to  this  improvement. 

A number  of  people  were  encouraged  or  challenged  to  take  more  tests, 
and  to  try  to  be  as  realistic  and  to  score  as  well  as  they  possibly  could. 
Ve  ended  up  with  11  subjects  who  took  an  appreciable  number  of  testa- 
enough  so  ve  could  discard  the  early  ones  they  took  while  they  were 
learning  the  procedures  and  the  consequences  of  the  admissible  scoring 
system. 

Figure  16  shows  the  apparently  stable  state  behavior  of  the  most 
biased  of  the  11  subjects.  The  line  designated  1^  Is  located  at  the 
mean  of  the  actual  Information  measures,  while  the  line  designated  Ip 
is  located  at  the  mean  of  the  perceived  Information  measures.  The  in- 
tersection of  the  two  lines  gives  s gross  indication  of  actual  versus 
perceived  Information  for  those  tests  the  subject  decided  to  attempt. 

By  taking  the  ratio  of  Ip  to  1^  ve  can  obtain  a rough  measure  of  the 
extent  and  direction  of  bias.  The  ratio  for  this  subject  is  2.44,  in- 
dicating that  she  thought  that  she  had  almost  two  and  one-half  times 
as  much  information  as  she  actually  had. 


Perceived  information 

Fig*  16  — Information  comparisons  for  subject  A,  the 


most  biasad  subject.  Early  tests  excluded.  Data  shown 
for  last  18  teits  taken  by  subject 
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Tsble  3 lists  some  personal  characteristics  for  the  11  subjects 
arranged  in  decreasing  order  of  bias,  which  goes  d wn  almost  to  the 
unbiased  value  of  1.00.  Notice  that  no  subject  yielded  an  overall 
ratio  less  than  one,  which  would  have  indicated  a person  who  typically 
undervalued  his  information.  Figure  17  compares  the  information  meas- 
ures for  subjects  B through  K.  Subject  B,  although  apparently  striv- 
ing to  reduce  bias  and  to  improve  his  score,  was  unable  to  do  so.  The 
remaining  subjects,  depicted  In  decreasing  order  of  bias,  were  Tore  and 
more  often  successful  in  producing  a realistic  assessment  of  their  un- 
certainty. Subjects  J and  K,  the  two  most  accurate  subjects,  were 
remarkably  consistent  in  demonstrating  their  ability  to  assess  their 
uncertainties  accurately. 


Table  3 

SUBJECT  CHARACTERISTICS 


Subject 

Vta 

ta 

Tests 

Sex 

Age 

Education 

A 

2.44 

0.31 

18 

Homan 

20-30 

Master’s  + 

B 

2.42 

0.17 

12 

Man 

30-40 

Doctorate 

C 

2.26 

0.28 

7 

Man 

50-60 

Doctorate 

D 

2.11 

0.32 

27 

Woman 

20-30 

Bachelor ' s 

E 

1.81 

0.18 

20 

Woman 

20-30 

Some  college 

F 

1.67 

0.40 

12 

Woman 

50-60 

Doctorate 

G 

1.52 

0.30 

20 

Woman 

30-40 

Bachelor  * s 

H 

1.33 

0.35 

9 

Girl 

9 

Third  grade 

I 

1.22 

0.38 

21 

Girl 

12 

Fifth  grade 

J 

1.02 

0.71 

34 

Man 

40-50 

Doctorate 

K 

1.00-i- 

0.85 

8 

Man 

40-50 

Doctorate 

In  conclusion,  the  introduction  of  decision-theoretic  testing  makes 
it  possible  to  define  and  to  measure  for  the  first  time  a human  ability, 
call  it  realism , which  may  prove  to  be  a • ..ry  important  determinant  of 
individual  and  team  performance.  For  example,  to  what  extent  and  in 
what  manner  is  an  unrealistic  student  handicapped  in  his  attempts  to 
learn  and  to  study  effectively?  For  another  example,  does  a team  of 
realistic  people  tend  to  out-perform  a team  of  overvaluing  people  and, 
it  so,  for  what  types  of  tasks?  Answers  to  these  and  many  other  ques- 
tions must  await  further  research. 


Actual  Information  Actual  information 
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Fig*  17  — * Continued 


We  have  shown  here  that  soae  people  can  be  very  realistic  over  a 
wide  range  of  subject  natter  while  others  characteristically  overvalue 
their  Information.  We  do  not  yet  know  what  deficits  In  this  ability 
exist  within  different  subgroups  of  the  population  nor  do  we  know 
exactly  how  to  go  about  educating  people  to  becone  note  realistic.  The 
results  for  subject  A,  sunmarised  in  Fig.  16,  certainly  prove  that 
level  of  education  does  not  Insure  realism  in  assessing  and  coasnmlcat- 
lng  uncertainty. 
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9.  THE  ODHSEQUBHCBS  OP  BIASED  PROBABILITIES 

Decomposing  the  test  score  provides  a convenient  means  for  shoving 
a student  the  consequences  of  having  less  than  perfect  realism  in 
assessing  the  value  of  information.  It  is  also  related  to  a major*  but 
little  known,  property  of  an  admissible  scoring  system:  A student '« 
actual  test  scare  ie  maximised  if  and  only  if  hie  responses  match  the 
conditional  success  ratios  defined  in  the  previous  section . Thus,  the 
effect  of  experience  upon  a student  who  desires  to  score  well  on  admis- 
sible probability  testa  should  be  in  the  direction  of  making  his  re- 
sponses conform  more  closely  to  the  conditional  success  ratios.  In 
other  words,  the  student  should  develop  his  ability  to  give  better 
probabilistic  predictions . 

The  maximum  test  score  obtainable  on  an  n-ltem  test  with  the  loga- 
rithmic scoring  rule  is  M(n)  - n(0.5K  log  k),  while  the  minimum  score 
is  m(n)  - n(0.5K  log  0.01K)  because  of  the  restriction  on  r.  If  S(n) 
is  the  total  test  score  earned  by  a student,  then  M(n)  - S(n)  is  the 
amount  of  improvement  left  in  order  to  achieve  perfect  mastery  of  the 
test,  and  when  K ■ 100  this  total  improvement  score  can  range  between 
0 and  lOOn.  Thus,  one  function  served  by  the  use  of  an  improvement 
score  is  the  elimination  of  negative  scores. 

This  total  improvement  score  may  now  be  broken  down  Into  two  scores, 
each  of  which  has  a meaningful  interpretation.  Suppose  the  test  is  re- 
scored using  the  adjusted  probabilities  p^,  computed  from  the  student's 
realism  function  as  described  above,  instead  of  the  student's  actual 

A 

responses  r...  This  procedure  yields  a new  score,  S(n),  which  typically 

*3  4.  A 

is  greater  than  or  equal  to  S(n).  The  adjusted  score  S(n)  is  an  esti- 
mate of  the  score  the  student  could  have  made  if  he  were  unbiased  and 

S 

For  a student  who  is  biased  in  assessing  uncertainty,  l.e., 
p + r(p),  we  have  the  possibility  of  conflict  between  maximizing  ex- 
pected score  versus  maximizing  actual  test  score.  While  of  profound 
importance,  a detailed  treatment  of  this  subject  is  beyond  the  ucope 
of  this  report.  The  conflict  is  resolved,  of  course,  if  the  student 
is  able  to  change  his  probabilities  to  match  the  conditional  success 
ratios, 
t 

Recall  that  the  realism  function  is  only  a least-squares  fit  to 
the  data.  If  the  realism  function  were  fitted  using  a «*«»■<«««  like- 
lihood procedure , the  logarithmic  score^vould  be  strictly  maximized 
and  there  would  be  more  assurance  that  S(n)  > S(n). 
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made  aore  effective  use  of  the  information  available  to  hi*.  Now, 

S(n)  - S(n)  represents  the  improvement  possible  through  no  re  effective 
use  by  the  student  of  the  information  already  available  to  him,  while 

<*v 

M(n)  - S(n)  represents  the  Improvement  possible  as  a result  of  his  gain- 
ing additional  information  pertaining  to  the  subject  utter  of  the  test. 
These  two  improvement  scores  are  a decomposition  of  the  total  test  score 
because,  when  sunned,  they  equal  the  total  improvement  score.  Such  an 
analysis,  of  course,  is  not  possible  with  the  simple  choice  method. 

10.  A LIKELIHOOD  RATIO  MEASURE  OF  PERSPICACITY 

Realism  appears  to  be  an  important  goal  for  human  behavior.  There 
is  some  indication,  however,  that  it  may  not  be  sufficient  as  an  ideal. 
For  example,  by  using  complex  strategies  which  sacrifice  potential  test 
score,  a student  might  be  able  to  produce  a realism  function  with  a 
slope  nearer  to  one.  This  kind  of  pseudorealisa  must  not  be  produced 
at  the  expense  of  test  score  and  if  the  proper  emphasis  is  placed  upon 
score,  it  probably  will  not  be. 

For  another  example,  there  is  the  question  of  a student's  ability 
to  discriminate  levels  and  patterns  of  uncertainty.  To  illustrate, 
consider  some  data  from  a 15-item,  three-answer  test.  Figure  18  shows 
the  15  probability  distributions  elicited  from  s student  inexperienced 
in  explicitly  assessing  uncertainty.  It  appears  that  this  student  was 
thinking  in  terms  of  which  answer  was  most  likely  to  be  correct  and,  as 
a result,  responded  along  the  line  going  from  the  no-inforaatlon  point 
up  to  complete  information.  Figure  19  shows  the  15  probability  distribu- 
tions elicited  from  a student  with  considerably  more  experience  in  ex- 
plicitly assessing  uncertainty.  It  appears  that  this  student  would 
.sometimes  use  Information  to  "rule  out"  one  of  the  answers  and  perform 
other  kinds  of  complex  discriminations  yielding  a variety  of  probability 
distributions. 

Consider  now  using  just  one  probability  distribution  to  represent 
each  student's  knowledge.  Let  pj  be  the  highest  probability  assigned 
for  Item  J,  p’j  be  the  next  highest,  and  p"j ' the  smallest.  The  average 
probability  distribution  (p,  p",  p"')  may  be  found  by  calculating 


g.  19  — Probability  distributions  (ignoring  permutations  among 
the  answer  labels  ) used  by  highly  trained  subject  taking  one 
15-item  test  and  yielding  a likelihood  ratio  of  36,55.  Circle 
represents  average  probability  distribution. 
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This  average  probability  distribution  is  displayed  as  a circle  in  Figs. 
18  and  19.  Notice  that  they  are  not  strikingly  different  for  the  two 
students. 

Which  set  of  probability  distributions,  the  original  set  or  the 
average  one  used  for  all  items,  is  the  better  predictor  of  the  set  of 
correct  answers?  To  be  more  specific,  consider  the  "data"  to  be  the 
sequence  of  correct  answers  and  let  pcj  be  the  original  probability 
assigned  by  the  student  to  the  correct  answer  to  item  J.  Then,  the 
likelihood  of  the  data  under  the  hypothesis  that  they  were  generated 
by  the  student’s  probability  distributions  is 


n 

Lj  - n 


J-l  CJ 


Now  consider  the  hypothesis  that  the  data  were  generated  by  the  con- 
stant average  probability  distribution.  That  is,  look  at  pcj  and  give 
it  the  value  p* , p",  or  p'"  according  to  whether  it  was  the  largest, 
middle , or  smallest  probability  in  the  set.  Or,  equivalently,  let 

n*  • the  number  of  times  pcj  was  largest, 

n"  - the  nwbar  of  times  pc^  was  next  largest,  and 

n"*  ■ the  number  of  times  pcj  was  the  smallest,  so  that 

n'  + n"  + n'"  - n . 


If  there  are  ties  among  the  pcj , fractional  numbers  must  be  used.  The 
likelihood  of  the  data  under  this  second  hypothesis  can  bs  written  as 
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The  likelihood  ratio  can  now  be  conputed  as  L^/Lj.  For  the  data  shown 
In  Fig.  18  this  likelihood  ratio  Is  about  0.2,  indicating  that  the  data 
are  about  five  times  more  likely  under  the  constant  probability  hypothe- 
sis. For  the  data  shown  In  Fig.  19  this  likelihood  ratio  is  about  37, 
indicating  that  the  data  were  about  37  times  more  likely  using  the 
student's  original  set  of  varying  probability  distributions  than  using 
the  constant  average  probability  distribution.  Thus,  this  likelihood 
ratio  may  prove  to  be  a useful  measure  of  a student's  progress  in  learn- 
ing how  to  extract  and  process  information  in  probabilistic  terms. 

11.  POTENTIAL  IMPACT  OF  TESTING  METHOD  UPON  STUDY  BEHAVIOR 

Because  lower  levels  of  mastery  often  require  much  less  effort  to 
achieve  than  do  the  higher  levels,  the  logarithmic  may  prove  to  be  a 
very  appropriate  reward  system  that  can  motivate  students  to  achieve 
higher  levels  of  mastery  of  a subject  matter  than  they  do  at  present. 

To  investigate  this,  assume  that  the  student  has,  for  each  question, 
an  exponential  "learning  curve"  of  the  form 

p - 1 - | exp  (-2Xc)  , 

where  c represents  the  cost  to  the  student  in  time  and  energy,  say,  of 
the  effort  he  puts  into  studying  the  question;  X is  a parameter  that 
reflects  the  "easiness"  or  rate  of  learning  of  the  subject  matter  of 
the  question;  and  p is  the  student's  probability  associated  with  the 
correct  answer.  For  the  sake  of  definiteness  and  simplicity,  assume 
that  each  question  has  only  two  possible  answers.  Thus,  if  the  student 
p»"  3 no  study  at  all  into  the  question  (i.e.,  c - 0),  his  probability 
for  the  correct  answer  is  0.5,  but  as  he  invests  effort  in  studying  the 

subject  matter  his  probability  Increases  asymptotically  toward  1.0,  as 

illustrated  in  Fig.  20. 

There  are  two  ways  of  modeling  the  way  a student  will  choose  to 
spend  his  study  time  and  effort.  You  \x\y  either  assume  that  he  has  a 

fixed  amount  of  time  available  and  seeks  to  allocate  it  across  the 

questions  in  such  a way  as  to  maximize  his  optimal  expected  score;  or 
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Fig. 20  — Piobability  as  a function  of  effort,  c,  where  p 1-1/2  exp(-2Xc) 

you  may  assume  that  there  is  some  "exchange  rate"  between  study  time 
and  score  (e.g.,  one  point  of  score  is  worth  three  minutes  of  time  to 
this  particular  student)  and  that  he  will  "spend"  his  time  on  each  ques- 
tion in  such  a way  as  to  maximize  his  "profit,"  l.e.,  the  difference 
between  his  optimal  expected  score  on  a question  and  the  value  of  the 
time  he  expends  on  studying  it.  These  approaches  will  be  discussed 
separately,  but  It  will  become  apparent  their  solutions  are  closely 
related. 

11.1  Allocation  of  Study  Effort  Among  Topics 

First,  suppose  that  the  student  has  a fixed  and  limited  amount  of 
study  time  available  and  wishes  to  allocate  it  over  the  questions  likely 
to  be  asked  in  such  a way  that  he  will  maximize  his  optimal  expected 
score.  On  a given  question,  by  following  the  optimal  test-taking  strat- 
egy he  will  expect  to  score 
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2jS(r*) |p(c)]  - E*  , 

where  p(c)  Is  Che  function  of  study  time  and  effort  defined  In  the  pre- 
vious section.  Figure  21  shows  optimal  expected  score  as  a function  of 
effort  for  a single  question  under  both  the  simple  choice  or  linear  and 
the  logarithmic  scoring  procedures.  The  maximum  return  (in  terms  of 
expected  score)  per  unit  of  effort  may  be  found  graphically  by  measur- 
ing the  slope  of  the  steepest  line  through  the  origin  which  is  tangent 
to  the  optimal  expected  score . function,  E . Analytically,  it  can  be 
determined  by  finding  the  point  where  the  derivative  of  (— ) with  re- 
spect to  c is  zero.  Now  in  fact, 

i.  A . I ffi  it  . S_  . -«  - pH°8[2»  - P)lg  - 1 

dc  'cJ  c dp  dc  2 " 2 

* c c 

Because  of  the  particular  form  chosen  for  p(c),  it  follows  that  the 
numerator  of  this  expression  depends  on  p alone,  not  on  c or  X.  Thus, 
there  exists  a "critical  value"  of  p,  say  p , for  any  given  scoring 
rule  such  that  on  any  question  and  regardless  of  what  X may  be,  the 
student  will  get  maximum  reward  par  unit  effort  to  bring  his  probability 
for  the  correct  answer  up  to  p . 

It  is  easy  to  calculate  p for  any  given  scoring  rule  (see  Appen- 
dix B).  To  be  specific: 

SCORING  RULE  CRITICAL  PROBABILITY 

Simple  choice  or  linear  0.5 

Logarithmic  0.891.... 

An  allocation  procedure  that  yields  an  approximately  optimal  solu- 
tion to  the  overall  problem  (and  an  exactly  optimal  solution  in  most 
cases)  is  as  follows.  Arrange  the  questions  in  order  of  Increasing 
study  difficulty  so  that  X^  i h •••  > X . The  student  should  work 
on  the  first  question  until  he  has  expended  enough  effort  so  that  p at  p 
and  the  ratio  of  marginal  return  to  marginal  cost  (that  is,  dE/dc)  is 
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Ju*t  equal  to  the  maximal  achievable  gala  per  unit  effort  on  the  aecond 
question.  Then  he  should  work  on  the  second  question  until  p > p*  and 
than  work  on  the  first  and  second  question  (keeping  aarglnal  return 
ratios  equal)  until  the  marginal  return  ratios  equal  the  i achiev- 
able gain  per  unit  effort  on  the  third  question.  The  process  Is  con-  j 

tlnued  until  the  student  has  allocated  all  ths  effort  he  has  available.  | 

This  allocation  procedure  will  yield  the  true  optimum  for  the  acor-  1 

lng  rules  considered  above  If  the  student  "runs  out  of  gas"  at  a point  I 

where  every  question  he  has  worked  on  at  all  has  been  worked  on  to  a \ 

point  whsre  p ft  p . In  more  complex,  nonreproducing  scoring  procedures  J 

1 
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Chat  do  not  have  steadily  diminishing  marginal  returns  for  p i p , the 
optimal  allocation  procedure  will  not  work  so  well. 

Now,  obviously,  a "real-life"  student  will  not  go  through  a care- 
ful quantitative  analysis  of  how  to  allocate  his  study  efforts,  but  the 
quantitative  model  (which  may  come  to  represent  the  behavior  of  experi- 
enced, test-wise  students  fairly  well)  does  catch  one  aspect  of  study 
behavior  that  is  worth  remarking:  The  use  of  a logarithmic  scoring 

rule  encourages  the  student  to  study  fewer  questions  to  a higher  degree 
of  mastery,  while  the  conventional  simple-choice  procedure  encourages 
the  study  of  more  questions  to  a lower  degree  of  mastery.  Which  in- 
centive system  Is  to  be  preferred  depends  upon  the  tradeoffs  between 
scope  and  retention  of  the  subject  matter  for  the  particular  learning 
situation  at  hand. 

Neither  incentive  system  is  beyond  Tault  when  study  time  is  strictly 
limited.  On  the  one  hand,  use  of  the  conventional  simple-choice  proce- 
dure may  mean  that  the  student  will  remember  none  of  the  subject  matter 
more  than  a few  hours  or  days  after  he  takes  the  test.  On  the  other 
hand,  tf  he  uses  the  logarithmic  procedure  he  may  remember  some  of  the 
subject  matter,  but  not  enough  for  it  to  be  of  any  use  to  him.  "Cram- 
ming" for  a test  can  easily  be  a losing  proposition  which,  with  the 
simple-choice  procedure,  yields  an  adequate  test  score  but  produces 
Little  learning. 

11.2  investment  ol  Study  Effort  In  a Single  Topic 

An  alternative  way  of  modeling  the  student's  study  incentives  is 
to  assume  that  his  study  time  is  not  strictly  limited  and  that  his  time 
has  a value  to  him  which  is  commensurable  to  the  value  of  the  test  score 
he  may  earn.  If  the  total  amount  of  time  which  he  may  spend  on  study 
is  flexible,  he  would  perhaps  attempt  to  maximize  his  "profit"  on  each 
test  question.  Tliat  is  to  say,  he  would  choose  an  expenditure  of  time 
c on  each  question  that  maximizes  E[r  |p(c)l  - sc,  where  s is  the 
value,  in  units  of  test  score,  of  a single  unit  of  time  (or  study  ef- 
fort). Assume  for  the  moment  that  the  units  of  time  (or  study  effort) 
have  been  normrlized  in  such  a way  that  s * 1. 

Within  the  context  of  the  quantitative  model  It  is  an  easy  task  to 
calculate  (see  Appendix  C)  as  a function  of  >,  the  optimal  investment 
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strategy  and  maximal  point  under  both  the  simple  choice  and  the  loga- 
rithmic  scoring  rules.  The  results  of  these  calculations  are  graphed 
in  Fig.  22.  For  a given  X the  simple  choice  procedure  allows  the 
larger  profit  and,  in  this  sense,  is  a more  lenient  reward  system  than 
is  the  logarithmic.  Under  the  simple  choice  procedure  it  never  pays 
to  work  on  a question  where  X < 0.5,  while  under  the  logarithmic  the 
student  cannot  make  a profit  if  X < 1.5.  If  X % 1.5,  the  student  will 
expend  considerably  more  effort  under  the  logarithmic  scoring  rule. 

Note,  by  the  way,  that  if  the  student  studies  a question  at  all  under 

the  "maximum  profit"  hypothesis,  he  studies  it  at  leaBt  up  to  the  level 

* 

where  his  probability  exceeds  p . 

Thus,  the  same  basic  pattern  appears  under  the  "maximum  profit" 
hypothesis  as  under  the  "optimal  allocation"  hypothesis.  Specifically, 
the  student  is  theoretically  motivated  to  study  fewer  questions  (through 
avoidance  of  the  harder  ones  with  X < 1.5)  but  to  a higher  degree  of 
mastery  under  the  logarithmic  scoring  rule  than  under  the  conventional 
simple  choice  procedure.  In  the  case  of  the  investment  problem,  how- 
ever, the  student  may  be  Induced  to  study  all  of  the  questions  by  in- 
creasing the  reward  for  learning  or  by  Increasing  the  rate  of  learning 
(X)  either  through  improving  learning  efficiency  or  through  reorganiza- 
tion of  the  subject  matter.  Any  of  these  steps  may  serve  to  resolve 
the  conflict  between  scope  of  learning  and  retention. 

Whether  these  effects  will  be  observable  in  real  students  in  real- 
life  situations  will  be  an  interesting  matter  to  investigate  empirically. 


12,  IMPACT  OF  INAPPROPRIATE  REWARDS  UPOIJ  TEST-TAKING  BEHAVIOR 

A fundamental  assumption  underlying  all  of  the  above  analyses  of 
optimal  behavior  is  that  the  student  wishes  to  maximize  his  expected 
test  Bcore.  What  may  happen  when  this  condition  is  relaxed? 

With  the  simple  choice  procedure,  a student  desiring  to  maximize 
expected  test  score  does  it  by  selecting,  for  each  question  on  the 
test,  that  answer  he  considers  moBt  likely  to  be  correct,  as  shown  in 
Sec.  5.1.  Suppose,  however,  that  a cutting  score  or  some  grading 
limits  are  Imposed  on  the  test  so  that  the  student  now  wishes  to  maxi- 
mize the  probability  that  his  test  score  will  equal  or  exceed  a speci- 
fied score,  say  N or  more  answers  correct. 
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To  find  the  optimal  test-taking  strategy  under  this  reward  struc- 
ture, as  suite  that  the  student  perceives  all  t:he  questions  to  be  inde- 
pendent. That  Is  to  say,  he  feels  that  the  probable  correctness  of 
the  answers  to  one  question  would  not  be  affected  by  what  the  correct 
ansver  turns  out  to  be  on  another  question.  w let 

PKj)  J * of  getting  question  j correct  given  that 

student  chooses  answer  i(j), 

Pg(K)  » probability  of  getting  K correct  out  of  the  first  l 
questions, 

P£<K+)  ” probability  of  getting  /.  or  isore  correct  out  of  the 

first  A questions. 

Then, 

<■„«*>  - jN  Vh> 

■ j Pl(n),n  P»-l(h  ‘ l)  * 11  " Pi(n),n1  Vl(h)| 

■ pi(n),i,  Vl«  - [>  + p„-l(W  • 

Since  Pn_^(N  - 1)  >0,  regardless  of  what  strategy  the  student  uses 
on  the  first  n - 1 questions,  it  follows  that  choosing  l(n)  so  that 
Pl(n)  n be  a B^i^ua  will  give  the  student  an  equal  or  better 
chance  of  getting  N or  more  correct  as  will  any  other  choice  on  the 
nth  question.  Clearly,  the  questions  could  be  renumbered  to  make  any 
question  the  "nth  question,"  and  thus  the  obvious  strategy  is,  Indeed, 
an  optimal  one. 

The  assumption  of  Independence  among  the  test  Items  was  used  in 
the  proof  given  above.  Consider  now  an  example  showing  that  this  re- 
sult does,  in  fact,  depend  on  the  assumption  of  Independence.  Here  Is 
the  teit: 


MM 


-50- 


1.  It  rained  in  Santa  Monica  on  July  24,  1932.  True  or  False? 

2.  It  did  not  rain  In  Santa  Monica  on  July  24,  1932.  True  or  False? 

You  must  get  at  least  one  Item  right  to  pass  the  test.  Obviously,  If 
you  answer  both  items  "True"  or  both  iteas  "False"  you  are  certain  to 
pass.  If  you  are  90  percent  certain  that  It  did  not  rain  in  Santa 
Monica  on  July  24,  1932  and  you  use  the  "obvious"  strategy,  then  there 

is  a 10  percent  chance  that  you  will  flunk.  This  shows  that  the  ob- 

vious strategy  is  not  necessarily  optiaal  if  the  questions  are  not 
independent . 

Be  that  as  It  may,  the  simple-choice  procedure  is  relatively  in- 
sensitive to  the  reward  structure  within  which  it  is  embedded.  As  a 
consequence  of  this  property  of  the  widely  used  simple-choice  scoring 
procedure,  test  givers  have  probably  gotten  in  the  habit  of  ignoring 
reward  structures  and  can  afford  to  use  cutoff  scores  and  prir.es  with 
abandon.  Such  behavior  can  cause  great  difficulty  when  one  attempts 
to  improve  testing  through  the  elicitation  of  personal  probabilities. 

The  notion  that  the  student  should  answer  each  question  in  such  a 
way  as  to  maximize  his  expected  score  is  based  upon  the  assumption  that 
he  has  a linear  utility  for  points.  In  many  educational  contexts  as 
they  currently  exist,  this  assumption  will  be  manifestly  out  of  line 
with  the  facts. 

For  example,  suppose  that  some  special  prize  is  to  be  given  to 
whoever  gets  the  best  score  for  a given  test.  This  will  tend  to  make 
students  overstate  their  probabilities  (or,  to  put  it  another  way,  to 
appear  to  overvalue  their  information),  because  the  chance  of  getting 
a really  high  score  will  be  worth  more  than  the  risk  of  getting  an  un- 
usually low  score  (which  will  be  no  worse  for  the  student  than  a medi- 
ocre score).  The  precise  quantitative  measurement  of  this  effect  1b 
very  difficult  in  general,  because  it  Involves  a multiperson  game  that 
is  affected  not  only  by  each  player's  perception  of  the  difficulty  of 
the  questions  but  also  by  his  perception  of  the  ability  of  the  other 
players.  However,  an  analysis  of  what  happens  if  cwo  players  are  asked 
a single  question  will  be  found  in  [7],  pp.  12-13. 

The  special  case  in  which  a prize  is  awarded  only  in  the  event  that 
the  student  makes  a perfect  score  is  very  easy  to  understand.  With  this 
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reward  structure • the  student  should  always  set  one  of  the  r1  ■ 1 bo 
matter  how  great  his  uncertainty  because  If  he  falls  to  do  so,  he  will 
foreclose  any  possibility  of  Baking  a perfect  score. 

Another  context  in  which  students  might  be  motivated  to  give  re- 
sponses other  than  their  personal  probabilities  is  any  situation  in 
which  all  that  matters  is  to  achieve  a given  level  of  score.  For  ex- 
ample, if  the  students  are  on  a "pass-fall"  system,  where  they  pass 
the  course  if  they  achieve  a certain  test  score  or  better,  and  fail 
the  course  otherwise,  then  they  stay  have  considerable  incentive  to 
shade  their  responses  up  or  down  from  their  probabilities.  The  gen- 
eral problem  of  determining  an  optimal  response  strategy  under  these 
circumstances  is  mathematically  very  complex  and  no  solution  is  known. 

The  following  simplified  example,  however,  can  be  solved  and  it  illus- 
trates very  clearly  how  the  imposition  of  a ' pass-fall"  reward  struc- 
ture on  top  of  a reproducing  scoring  system  may  completely  destroy  any 
incentive  for  students  to  respond  with  their  probabilities. 

Suppose  a student  faces  an  exam  consisting  of  n two-answer  items. 
Suppose  these  questions  all  "look  alike"  to  the  student,  in  the  sense 
that  on  each  question  he  has  a fixed  probability  distribution,  p and 
1 - p,  with  p > 1/2.  Suppose  that  he  requires  a total  score  T on  the 
test  in  order  to  pass.  He  wants  to  choose  a fixed  response  r to  assign 
to  the  preferred  answer  to  each  question.  What  value  of  r should  he 
choose  in  order  to  maximize  his  probability  of  passing  the  test?  It 
is  not  hard  to  show  (see  Appendix  D) , that  he  will  have  the  maximal 
probability  of  passing  if  he  chooses  r such  that  E[S(r)|r]  - T/n.  Note 
that  this  r does  not  depend  on  p at  all!  So  the  student's  optimal  test- 
taking strategy  depends  only  on  what  score  he  must  make  in  order  to  pass, 
and  not  on  his  level  of  knowledge  with  respect  to  each  test  item.  In 
short,  this  reward  structure  utterly  destroys  the  reproducing  character 
of  the  scoring  rule.  Figure  23  illustrates  the  student's  probability 
of  passing  as  a function  of  his  response  strategy  in  the  particular  case 
where  T * 0.58,  n - 20,  and  p - 0.8.  Note  that  the  student  will  be 
about  nine  times  as  likely  to  fail  the  test  if  he  pursues  the  "maximum 
expected  value"  strategy  as  he  will  be  if  he  follows  the  "maximum  proba- 
bility of  passing"  strategy. 
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In  an  actual  situation,  however,  the  reproducing  character  of 
the  scoring  rule  would  not  be  completely  washed  out,  because  the  stu- 
dent would  not  have  precisely  the  same  probability  distribution  for 
each  item.  It  seems  intuitively  evident  (although  a rigorous  proof 
has  not  yet  bean  discovered)  that  his  beat  strategy  would  be  to  hedge 
all  his  responses  but  still  let  his  responses  vary  somewhat  with  his 
probabilities. 

But  the  best  remedy  is  to  avoid  creating  reward  structures  which 
put  a highly  nonlinear  value  on  points  earned  under  an  allegedly  re- 
producing scoring  rule.  Another  (partial)  remedy  is  to  avoid  letting 
the  student  know  how  many  questions  there  are  on  a test,  or  how  dif- 
ficult they  are,  before  he  begins  to  take  it. 

13.  SUMMARY  AMD  CONCLUSIONS 

We  have  seen  that  it  is  patently  desirable  to  broaden  the  responses 
that  students  are  permitted  to  make  to  multiple-choice  questions.  The 
reasons  for  this  are  as  follows:  the  student  is  then  able  to  transmit 

more  information  to  the  teacher  on  each  item;  conventional  multiple- 
choice  testa  do  nothing  to  train  the  student  to  weight  the  strength  of 
conviction  justified  by  his  knowledge  on  a given  item;  and  students 
themselves  prefer  greater  freedom  of  response  and  chafe  under  the  limi- 
tations of  the  conventional  one-choice  response  format. 

However,  it  is  meaningless  or  even  deceptive  to  permit  students 
to  give  a weighted  response  rather  than  a unitary  choice  if  the  scor- 
ing system  is  not  carefully  chosen  so  as  to  encourage  students  to  use 
the  full  range  of  choice  available  to  them.  For  example,  if  the  stu- 
dent is  allowed  to  respond  with  weights  (which  add  up  to  one  over  all 
alternative  responses  on  each  question)  and  is  then  given  a score  on 
each  question  equal  to  the  weight  he  ascribed  to  the  correct  alterna- 
tive, then  it  will  not  take  an  intelligent  student  long  to  recognise 
that  he  should  not  utilize  the  freedom  you  have  made  available  to  him, 
but  simply  respond  with  weights  of  zero  and  one  as  in  a conventional 
multiple-choice  test.  One  excellent  solution  to  this  problem  appears 
to  be  the  use  of  "admissible  scoring  systems,"  which  are  designed  to 
provide  the  student  with  a maximum  expected  score  if  he  makes  his  re- 
sponses correspond  to  his  subjective  probsbilitlss. 
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Admlsslble  scoring  procedures  have  many  desirable  features.  They 
link  the  student’s  responses  to  the  well-developed  disciplines  of  sub- 
jective probability,  information  theory,  and  Bayesian  declslonsuking. 

The  student  who  becomes  ''test-wise"  against  a reproducing  scoring  sys- 
tem has  learned  to  express  his  uncertainty  in  the  universal  language 
of  probability  theory.  Be  has  also  learned  to  weight  the  facts,  clues, 
and  reasons  available  to  hla  and  come  up  with  a "risk-balancing"  re- 
sponse. Preliminary  data  from  computer -administered  admissible  proba- 
bility tasting  show  that,  while  some  psople  possess  this  aptitude, 
others  are  quite  biased  in  their  assessment  of  uncertainty  and  could 
benefit  greatly  from  further  training  in  this  skill.  Admissible  scor- 
ing procedures  also  have  the  theoretical  advantage  that  they  lead  the 
student  toward  higher  degrees  of  mastery  than  do  conventional  scoring 
procedures.  That  is  to  say,  the  student  perceives  increased  rewards 
for  higher  degrees  of  certainty  on  each  question  under  an  admissible 
scoring  system  than  under  a conventional  multiple-choice  scoring  sys- 
tem. The  latter  tends  to  encourage  auperflcial  knowledge  of  a wide 
variety  of  topics;  the  former  encourages  total  mastery  of  a smaller 
number  of  topics.  This  perception  should  have  a desirable  effect  on 
the  student's  study  habits.  Whether  this  effect  will  be  observed  in 
practice  makes  an  Interesting  topic  for  future  experiments. 

It  will  be  very  important,  in  practical  applications  of  admissible 
probability  testing,  to  Insure  that  the  external  incentive  system  (i.e., 
what  is  done  with  the  test  scores)  be  consistent  with  the  basic  assump- 
tion of  admissible  probability  testing.  That  is  to  say,  the  students 
must  perceive  the  maximization  of  expected  score  as  being  their  best 
strategy.  In  theory,  the  use  of  a "pass-fall"  system,  or  the  use  of 
extreme  competition,  may  have  the  effect  of  distorting  the  students' 
responses  away  from  their  true  subjective  probabilities.  The  value  of 
the  students'  rewards  must  be  somehow  proportional  to  the  total  score 
they  each  receive.  Whether  this  problem  turns  out  to  be  serious  or  not 
is  another  question  that  can  be  answered  only  by  empirical  tests. 
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Appendix  A 

FITTING  A PLANAR  REALISM  FUNCTION 


1.  NOTATION.  NORMALIZATION.  AND  SYMMETRY 

Assume  there  are  n question*  In  the  test,  with  three  possible 
answers  for  each  question.  We  reorder  the  answers  so  that  p|  i pi  i pj, 
where  p|  is  the  probability  the  student  ascribes  to  the  1 th  answer  on 
the  j th  question.  We  let  1^  denote  the  oorveot  answer  on  the  J th 
question. 

We  wish  to  find  a linear  transformation 


ql  " *llpl  + a12p2  + a13p3 


q2  “ *21pl  + a22p2  + a23P3 


q3  " a31Pl  + a32p2  + a33p3 
that  will  minimize  the  quantity, 


(A.l) 


a 


n J , . - 

l ? <«ii  - ei>2  * a . 

j«i  i-i  1 1 


(A.  2) 


where  e|  Is  zero  or  one  depending  upon  whether  answer  1 to  the  j t> 
question  Is  Incorrect  or  correct. 

In  addition  to  minimizing  expression  (A. 2),  we  require  the  trans- 
formation to  meet  certain  conditions  of  normality  and  symmetry: 


(A) 

If 

l P^1 
1-1  1 

, then 

l qi 
1-1  x 

(B) 

If 

P1  " p2  • 

then 

ql 

‘ q2  • 

(C) 

If 

f2  “ p3  ’ 

then 

q2 

' q3  ‘ 




-i.  1 ii  --j.-ll.-J  11- 
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f I f 

All  three  of  these  conditions  together  laply  that  p - («■»  t,  t) 

111  3 J ■* 
is  carried  into  q * (j,  y,  y)  by  our  transformation;  thus  for  all  1, 


if.  ! 


k-1 


*ik 


• 1 . 


(A.  3) 


Condition  A,  applied  in  turn  to  p ■ (1,0,0),  p - (0,1,0)  and 
p ■ (0,0,1),  implies  that  for  all  i, 


I *ki 
fc-1  1 


1 . 


(A.  4) 


1 1 


Condition  B,  applied  to  (y,  y,  0),  implies 


*11  + *12  " *21  + *22 
Condition  C,  applied  to  (1,0,0),  implies 


(A.  5) 


*21  " *31  * 


(A. 6) 


Now  let  us  denote  a^  by  a,  and  a^  by  8.  From  (A. 6)  and  (A. 4) 
see  that 


1 - a 


*21  " *31  " 2 


(A.  7) 


Fro*  (A.  3)  we  see  that 


*12 


■ 1 - a - 6 . 


(A.  8) 


From  (A. 5),  (A.  7),  and  (A.  8)  we  have 


*22 


1 + a - 2B 


(A.  9) 


1 


M 


•I 

! ] 


mA 


Proa  (A. 8),  (A. 9),  and  (A. 3)  we  derive 


*23  " 


6 . 


(A.  10) 


Applying  (A. 4)  now  yields 

e32  - --1  end  aJ3  - 1 - 2fi  . (A.ll) 

In  suaawry,  Che  application  of  normality  and  symmetry  conditions 
shows  that  systea  (A.l)  aey  be  written 

ql  “ apl  + (1  “ “ " p2  + 6p3 

q2  - 1-Y2-  PX  + 26)  P2  + Bp3  (A. 12) 

q3  - PX  + H-t  P2  + (1  - 2B)p3  . 

It  is  easy  to  see  that  these  expressions  are  necessary  and  suf- 
ficient conditions  for  (A),  (B),  and  (C)  to  hold. 

The  parameters  a and  B have  an  insoedlate  Interpretation,  as  follows. 
The  requlreaent  that  p^  * p2  * P3  means  that  we  are  restricting  our 
attention  to  one-sixth  of  the  "answer  triangle"  (the  shaded  area  in  the 
upper  left-hand  triangle  of  Fig.  24).  The  mapping  (A. 12)  leaves  the 
point  (4,  4,  4)  fixed,  carries  (1,0,0)  Into  (a, 0,0),  and  carries 
( 2 * ytO)  into  ( — g-2',  — 1 ”»  B).  These  three  points  are  the  vertices 
of  the  shaded  triangle,  and  by  knowing  what  happens  to  them  it  is  easy 
to  visualise  what  happens  to  all  other  points  in  the  trlanglt.  Figure 
24  Includes  three  examples  of  what  the  mappings  look  like  for  different 
values  of  a and  8. 

2.  MIHPtlZATIOH  OF  A 

This  section  gives  the  formulas  required  to  calculate  values  of 
a end  B that  will  minimise  A,  the  quantity  defined  by  (A. 2).  The  der- 
ivation of  these  formulas  is  by  taking  the  derivative  of  A with  respect 
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to  a and  B.  W«  skip  th«  intermediate  atepa  in  this  rout  in*  calcula- 
tion, and  juap  directly  to  our  final  formulas: 


a . ca.  CBB  ~ CB.  cofi  . 
coo  C8B  " *8a  *a6  ’ 


CB.  coa  ~ ca.  cBa 
coo  *86  “ *8a  *a8 


(A. 13) 


Let 


The  quantities  appearing  in  these  formulas  are  defined  as  follows. 


*1  “ pl  " p2  ®1  “ p3  “ p2  *1  “ 4 


i P2-Pl  .1  1 1 

i 2 4 ' p3  ‘ P2 


°2  2 


(A.  14) 


4 “ P"S—  B3  - 2<P2  “ p3>  °3  “ + p3 


The  reader  will  note  that 


qi  " a + Bi  ® + cj 


Now  we  define 


(A.  15) 


n 3 


aa 


*88 


- 1 1 aJa} 


3-1  1-1 
n 3 


n 3 . . 

caB  " c6a  * E I Ai  B1 
w J-1  1-1  1 1 


- 1 i 4 bJ 

1.1  J -1  * * 


J-1  1-1 


n 3 


'a.  * l *1  - J l Aj  cj 
3-1  3 3-1  l-l 


n 3 


A*  J U ^ J . 

- ■ A i - A A ,J‘  ^ 


(A.  16) 
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Recall  that  ij  1h  the  subscript  of  the  probability  ascribed  to  the 
correct  answer  on  the  j th  question.  How  the  reader  may  be  bothered 
by  the  following  question:  suppose  a respondent  on  the  first  questions 

lists  probabilities  (0.4,  0.2,  0.4),  and  the  third  answer  is  in  fact 
correct.  We  reorder  the  probabilities  to  get  (0.4,  0.4,  0.2),  but 
what  value  do  we  take  for  i^?  Should  it  be  1 or  2?  It  does  not  matter, 
aB  far  as  calculating  our  coefficients  a and  B is  concerned,  for  ij 
enters  into  the  calculation  only  as  a subscript  for  A's  and  B's;  when 

this  ambiguity  arises  about  whether  i,  ■ 1 or  i,  * 2,  we  have  p^  ■ p~ 

1111  3 3 
and  so  Ai  ■ A,  and  IK  » Bi.  Similarly,  if  there  is  ambiguity  over 

whether  i^  ■ 2 or  ij  ■ 3,  we  have  A£  “ and  Bj  • By  Because  of  the 
symmetry  built  into  Eq.  (A. 12),  it  also  makes  no  difference  which  pos- 
sible value  of  ij  we  select  (where  ambiguity  exists)  in  calculating 
the  total  score  awarded  to  the  "transformed"  estimates. 

3.  TRUNCATION  AMD  RENORMAL IZATION 

The  procedure  above  does  not  necessarily  lead  to  a vector  (q^, 
q2,  q2)  that  is  a proper  probability  vector.  Although  the  q's  will 
sum  to  one,  they  will  not  necessarily  fall  between  zero  and  one.  We 
truncate  and  renormalize  in  the  obvious  way: 

.*  (1,  max  (0,  q|)) 

qi  ‘oin  d] 

(A.  17) 
3 

d,  - l min  (1,  max  (0,  qi))  . 

3 i-1  1 

This  truncation  may  seem  rather  arbitrary  and  ad  hoc.  Recall, 
however,  that  it  will  take  place  only  if  the  respondent  undeve at ink- tea 
his  knowledge  (and  a and  B do  not  fall  between  zero  and  one) , a phenom- 
enon that  so  far  has  occurred  only  rarely  and  with  naive  subjects. 
Therefore,  the  use  of  a more  sophisticated  truncation  and  renormaliza- 
tion routine  hardly  seems  Justified. 
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Appendix  B 


HOW  TO  CALCULATE 


VALUE  OF  "p"  AT  WHICH  MAXIMUM 


EXPECTED  RETURN  PER  UNIT  EFFORT  IS  ACHIEVED 


Recall  that  In  Sec.  11  ve  assumed , for  a given  true-falae  ques- 
tion, that  a student's  probability,  p,  of  choosing  the  correct  response 
could  be  expressed  as  an  exponential  function  of  the  effort  he  put  Into 
studying  the  question.  Specifically,  ve  assumed  that 


p ■ 1 - | exp (-21c)  , 


(B.l) 


where  c represents  the  study-tine  ("cost")  and  X Is  a parameter  reflect- 
ing the  "easiness"  of  the  question.  Recall  also  that  the  expected  score 
a ntudent  is  able  to  make  on  a question  can  be  expressed  as  a function 
of  his  probability  of  choosing  the  correct  response.  Specifically, 


ElS(p)|p)  « pS(p)  + (1  - p)S(l  - p)  . 


(B.2) 


We  assume  In  the  formula  (B.2)  that  the  scoring  function  Is  sym- 
metric; i.e.,  the  student  gets  exactly  as  much  credit  for  "0.7  true; 
0.3  false"  if  "true"  is  correct  as  he  gets  for  "0.3  true;  0.7  false" 
if  "false"  is  correct.  By  combining  (B.l)  with  (B.2)  we  may  express 
maximum  expected  score  directly  as  a function  of  "cost,"  c,  and  "easi- 
ness," X: 

E[X,c]  ■ (1  - j exp(-2Xc))S(l  - j exp(-2Xc)) 


+ j exr  s -2Xc)S(|  exp(-2Xc))  . 


(B.3) 


Now,  it  is  Immediately  evident  that  E[l,Xc]  - E[X,c]  for  all  posi- 
tive values  of  X and  c.  If  vs  are  looking  for  the  maximum  return  per 
unit  effort,  we  may  apply  this  observation  as  follows: 
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E(A.c>  , E(l.Ac) 

max  ^ 1 * * max  A — V* — u 

csO  c cSO  Ac 


Z&jc l 


(B.4) 


In  other  words t if  ct  represents  the  cost  that  maximizes  E^*c^t 
^ ^ ^ c 

then  Ac^  “ c^.  This  is  true,  in  an  obvious  sente , even  if  the  c*  are 

not  unique.  In  our  learning  model,  p depends  on  the  product  of  X and 

* , * 

c.  This  is  extremely  convenient,  for  if  we  let  p,  m Ac, , we  see  that 

* * A A 

VX  " Pi*  In  other  words,  the  maximum  return  per  wiit  effort  is  achieved 
on  a given  true-false  question  by  studying  that  question  until  a given 
probability  of  choosing  the  correct  answer  is  achieved;  and  this 
"mastery  level"  (which  we  shall  call  p ) does  not  depend  on  the  easi- 
ness of  the  question,  but  only  on  the  scoring  function  used.  This 
critical  mastery  level  is  thus  a characteristic  of  the  scoring  func- 
tion; presumably  students  will  study  harder  when  faced  with  a scoring 
function  with  a high  critical  mastery  level  than  they  will  when  faced 
with  a scoring  function  having  a low  one. 

If  the  scoring  function  is  differentiable,  elementary  ulus 
may  be  used  to  calculate  the  value  of  p . One  good  way  to  do  this  is 
to  use  the  chain  rule,  as  follows: 


d / EfS(p)  Ini  \ 1 dE  d£_  E_ 

dc  \ c / c dp  dc  2 


(B.5) 


Example  1:  Let  S(p)  ■ log ^2)^  * 1®  t*ie  logarithmic  scoring  rule 

normalized  so  that  S(|-)  - 0 and  S(l)  - 1.  Then 


log  2 


log  (■£"£' 
log  2 


(B.6) 


X exp(-2Xc)  “ 2(1  - p)X  . 


t 


f 


i 


I 
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Therefor e, 

c2  log (2)  (| > - c • ^log(1  • 2(1  - p)X 

-p  log(2p)  + (1  - p log(2(l  - p>)  . (B.7) 


Since 


C - . ».») 

we  see  that  (B.7)  may  be  expressed  as 

c2  log2  ^ (J)  - -log  (3-^)  log (2(1  - p))(l  - p) 

- p log2p  - (1  - p)  log (2(1  - p»  . (B.9) 

V 

The  maximum  value  of  — will  be  achieved  at  the  point  where  the 
right-hand  side  of  (B.9)  quals  aero.  Solving  this  transcendental 
equation  Is  very  difficult  by  hand,  but  Is  easy  (using  the  method  of 
false  position  or  Newton's  method)  on  any  computer.  The  derivative 
in  (B.9)  is  aero  at  p « y;  is  positive  for  7 < p < 0.8910751  ...;  and 
is  negative  for  0.B910751  ...  < p.  Thus  the  maximum  expected  score 
per  unit  effort  is  achieved  for  p - 0.8910751  .... 

2 

Example  2;  Let  S(p)  “ 1 - 4(1  - p)  . This  is  the  "quadratic  scoring 
system,"  or  "Brier  score,"  often  used  by  meteorologists  to  evaluate 
the  quality  of  probabilistic  weather  predictions.  In  our  case,  we 
normalise  It  so  that  s(y)  » 0 and  s(l)  ” 1.  Then 

E » (2p  - l)2 

ff-8p-4  (B.10) 

fj-2(l-p>X. 


I 


s 

1 1 
t 


2 d ,E. 

c 37  (7> 


c • (8p  - 4)  2 * (1  - p)X  - (2p  - 1)‘ 


(B. 11) 


Using  relation  (B.8),  we  aee  that 


E ■ 2p  - 1 


dE  _ , 
dp  2 


(B- 13) 


^ - 2(1  - p)A 


c2  XT  <r)  * c * 2 * 2(1  - p)X  - (2p  - 1)  . 


(B.14) 


e2  (£>  - [-«  * logI2  • (1  - p)l  • (1  - p)  - (2p  - 1)  ] (2p  - 1)  . 


(B.12) 


F 


Simple  calculation  shows  that  the' derivative  in  (B.12)  is  zero  at 
P m jt  positive  for  ^ < p < 0.857665933  ...;  and  negative  for  p > 
0.85766593  ....  Thus  the  maximum  expected  score  per  unit  effort  against 
the  quadratic  scoring  system  is  achieved  at  p ■ 0.85766593  ....  In 
short,  the  quadratic  scoring  system  is  apparently  slightly  less  effective 
(theoretically)  than  the  logarithmic  scoring  system  in  stimulating  stu- 
dents to  work  hard  on  individual  questions. 

Example  3:  The  techniques  of  this  appendix  may  also  be  applied  to  scor- 

ing systems  that  are  not  admissible.  For  example,  suppose  a student  is 
approaching  a true-false  test  that  is  to  be  marked  and  graded  in  the 
traditional  way  (+1  for  a right  answer;  -1  for  a wrong  one).  Then  if 
a student  has  probability  p of  selecting  the  right  answer,  his  expected 
score  on  the  question  will  be  p • (+1)  + (1  - p)  • (-1).  We  therefore 
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It  follow*  that 


c2  A 

c dc  V 


-2(1  - p)log(2(l  - p))  - (2p  - 1) 


(B.15) 


Thfc  darlvatlv*  in  (B.15)  la  negative  for  j < p < 1.  It  follows 
that  (f)  la  a aaxlsua  at  p • In  other  words*  the  sitIsms  return 
per  unit  effort  against  a conventional  true-false  question  Is  achieved 
by  putting  forth  an  infinite a laal  aaount  of  effort. 


t: 


w 


Appendix  C 

ALLOCATING  STUDY  EFFORT  TO  MAXIMIZE  PROFIT 


In  Appendix  B we  analysed  the  problem  of  how  much  study  would  get 
the  — return  per  unit  effort.  Another  approach  to  the  question 
of  how  students  will  be  aotivated  to  study  is  to  suppose  that  study 
effort  and  points  gained  on  a teat  question  can  be  Measured  in  con- 
Mensurable  terns.  The  student  is  then  in  the  position  of  "purchasing" 
expected  score  with  study  effort.  You  night  expect  the  student  to 
atteapt  to  nsxlnlze  his  "profit";  that  is,  to  try  to  nake  the  differ- 
ence between  the  value  of  the  score  he  expects  to  gain  and  the  value 
(to  kin)  of  the  effort  he  expends  in  study.  In  short,  he  will  try  to 
aaxlalxe 


B(l,c)  - c . 


(C.l) 


If  the  reward  system  is  such  that  E is  a differentiable  function 
of  p,  then  this  maximization  problem  may  be  solved  by  finding  the  point 
at  which  the  derivative  is  zero. 


(C.  2) 


Ue  assume,  as  in  Appendix  B,  that 


p - 1 - j exp  (-21c) 

- 1 exp(-21c)  - 12(1  - p)  . 


(C.  3) 


Combining  (C.2)  with  (C.3)  we  see  that  the  derivative  of  profit 
will  be  zero  where 


2(1  - <■>  5? 


(C.4) 


Although  we  would  ordinarily  think  of  fixing  X and  then  aolving 
for  p (or  what  is  the  ease  thing,  c).  Expression  (C.4)  Is  such  an  easy 
formula  that  the  best  way  to  derive  nuserical  values  seems  to  be  to 
regard  p as  a parameter  and  derive  the  maximum  expected  profit  as  a 
function  of  X by  plotting  the  curve  |e(p)  + *°®2A(p)  * 

Example  1:  Consider  the  logarithmic  scoring  system,  S(p)  - * 

Then  we  have 


X(p) 


log  2 

2(1  - p) [log (p/1  - p)l 


E(p)  - c - 1 + P P ± (C.5) 


+ (1  - p)log  uVl  - P)lo&  211_-  p) 
+ log  2 


Carrying  out  these  calculations  yields  the  following  values: 


p 

lambda 

coat 

profit 

0.99 

7.542 

0.25934 

0.65986 

0.98 

4.453 

0.36146 

0.49710 

0.97 

3.323 

0.42327 

0.3B233 

0.96 

2.726 

0.46321 

0.29449 

0.95 

2.354 

0.48906 

0.22<'54 

0.94 

2.099 

0.50500 

0.16756 

0.93 

1.914 

0.51360 

0.12048 

0.92 

1.774 

0.51658 

0.06124 

0.91 

1.664 

0.51514 

0.04839 

0.90 

1.577 

0.51018 

0.02082 

0.89 

1.507 

0.50238 

-0.00229 

0.88 

1.450 

0.49226 

-0.02163 

Mote  that  for  p less  than  about  0.9  there  is  really  no  value  of  X 
for  which  such  a p is  optimal,  since  it  is  better  not  to  study  at  all 
(and  get  zero  profit)  than  to  do  any  studying  and  get  a negative  profit. 


Example  2:  Mow  let  us  turn  to  the  quadratic  scoring  system,  S(p)  - 

1 - 4(p  - l)2. 


1 

! 


*<P)  - 2(1 

1 

- p)I4(2p  - 

D] 

(C.6) 

E(p)  - c - (1 

- 2p)2  + <1  - 

- p)4(2p  - 

1) log  2(1  - p)  . 

Carrying  out  these 

calculations 

\ 

f 

leads  to  the  following: 

1 

P 

lambda 

cost 

profit  ; 

0.99 

12.755 

0.15335 

0.80705 

0.98 

6.510 

0.24721 

0.67439  j 

0.97 

4.433 

0,31735 

0.56625  1 

0.96 

3.397 

0.37179 

0.47461  ! 

0.95 

2.778 

0.41447 

0.39553  ! 

0.94 

2.367 

0.44780 

0.32660 

0.93 

2.076 

0.47344 

0.26616 

0.92 

1.860 

0.49260 

0.21300 

0.91 

1.694 

0.50621 

0.16619 

0.90 

1.563 

0.51502 

0.12498 

0.89 

1.457 

0.51965 

0.08875 

0.88 

1.371 

0.52061 

0.05699 

0.87 

1.299 

0.51835 

0.02925 

0.86 

1.240 

0.51326 

0.00514 

0.85 

1.190 

0.50567 

-0.01567 

i 

If  X is  less  than  about  1.2,  it 

! 

is  better  not  to  study  at  all,  and  ! 

accept  zero  profit,  for  no  finite  amount  of  effort  expended  will  lead 

i 

to  a commensurate  reward. 

I 

Example  3:  As  a final  example,  consider  a normal  true-false  test. 

This  is  not  an  admissible  scoring  system,  but  we  can  see  that  E(p)  * 

2p  - 1 (p  1 y).  Thus  | 

■ STrrTT  ! 

(C.7) 

B(p)  - c - 2p  - 1 + 2(1  - p) log  2(1  - p)  . 


Computation  yields  the  following: 


p 

lambda 

coat 

profit 

0.95 

5.000 

0.23026 

0.66974 

0.90 

2.500 

0.32189 

0.47811 

0.85 

1.667 

0.36119 

0.33881 

0.80 

1.250 

0.36652 

0.23348 

0.75 

1.000 

0.34657 

0.15343 

0.70 

0.833 

0.30650 

0.09350 

0.65 

0.714 

0.24967 

0.05033 

0.60 

0.625 

0.17851 

0.02149 

0.55 

0.556 

0.09482 

0.00518 

0.50 

0.500 

0.0 

0.0 

If  X < 0.5,  Chen  any  positive  amount  of  study  la  unrenunerative 
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Appendix  D 

SOME  RESULTS  OF  OPTIMAL  STRATEGIES  TO 
ACHIEVE  A PASSING  GRADE 


Let  ue  suppose  s student  feces  s test  consisting  of  II  questions, 
each  with  two  alternatlvd  answers.  He  knows  the  test  will  be  scored 
Using  an  adnlsslble  scoring  system  S,  but  that  only  thing  that 
natters  Is  that  his  total  score  exceeds  a certain  "passing  threshold" 

T.  Suppose  all  the  questions  "look  alike"  to  hln,  in  the  sense  that 
on  each  question  he  feels  there  Is  probability  p Jk  y that  one  alterna- 
tive Is  correct,  and  probability  1 - p that  the  other  la  correct. 

Assuae  also  that  the  questions  are  Independent  (In  the  stochastic  sense). 
Then  the  student  will  perceive  that  his  chance  of  ascribing  the  higher 
probability  to  the  correct  alternative  on  exactly  K out  of  N questions 
Is  exactly 


(J)PK<1  - P)H"K  • (D.l) 

If  he  makes  the  sane  response  ((r,  1 - r) , r > y)  on  each  question, 
then  the  value  (V)  of  his  score,  If  he  ascribes  the  higher  probability 
to  the  correct  alternative  on  K out  of  H questions,  will  be 


V (K,  r)  - KS(r)  + (N  - K)S(1  - r)  . 


(D.2) 


If 


r > j,  S(r)  > S(1  - r).  Therefore,  If  Kx  > K2,  then  VO^,  r) 


> V(K2,  r).  Now  let  us  define  K (r)  as  follows: 


*,  N _ T - NSfl  - r) 

K <r>  SCr)  - S(l  - r)  * 


(D.3) 


K (r)  Is  not  necessarily  an  Integer.  By  virtue  of  the  above  equa- 
tion, however,  If  K Is  an  Integer  such  that  K > K*(r),  then 


KS(r)  + (N  - K)S(1  - r)  > T . 


(D.4) 
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Therefore*  the  student  will  maximize  his  probability  of  "passing  the 
test"  (i.e.,  getting  a score  greater  than  T)  if  he  selects  that  r which 
minimizes  K (r).  I assert  that  the  optimum  value  of  r (which  we  shall 
call  r ) is  that  r which  satisfies  the  equation* 


A . *.  . A.  „ A.  T 

r S(r  ) + (1  - r )S(1  - r ) - 5 . 


(D.5) 


By  substituting  (D.5)  in  (D.3)  we  see  that 


A.  *. 

K.(rJ>_ 

N 


* 

r . 


(D.6) 


Now  consider  some  r i r * r > x.  We  will  show  that  K (r)  > K (r  ), 

* £ 

thus  proving  that  r is  an  optimal  response.  By  definition  of  what  an 
admissible  scoring  system  is,  we  know  that 


S(r*)  + (l  - K fe-*)s(l  - r*)  > * ) S(r) 


(t-W 


(D.7) 


5(1  - r) 


From  this,  and  (D.3),  we  deduce 

K*(r)S(r)  + (N  - K* (r))S(l  - r)  - T > K*(r*)S(r) 
+ (N  - K*(r*))S(l  - r)  . 


(D.8) 


Thus 

K*(r)tS(r)  - S(1  - r)]  K*(r*)[S(r)  - S(1  - r)] 

K*(r)  ' K*(r*).  <»•«> 

The  fact  that  the  solution  to  (D.5)  is  an  optimal  response  in  this 
setting  is  a striking  illustration  of  the  fact  that  nonlinear  utility 
for  score  may  destroy  the  admissible  property  of  a scoring  system*  for 


, 

1 
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Eq.  (D.5)  does  not  depend  upon  the  student's  subjective  probability  p 
at  all! 

It  is  interesting  to  note,  by  the  way,  that  the  optimal  total  test 
strategy  may  not  Involve  making  the  same  response  on  all  questions,  even 
when  the  student's  subjective  probabilities  on  all  questions  are  the 
same.  For  example,  if  T ■ S(l)  and  he  is  completely  uninformed  (p  ■ j) 
on  all  questions,  then  he  will  secure  a 50  percent  chance  of  passing 
by  making  a (1,0)  response  on  one  question  and  a (j,  j)  response  on  all 
the  rest.  This  is  manifestly  a better  chance  than  he  can  secure  by  any 
strategy  that  calls  for  the  same  (r,  1 - r)  response  on  every  question. 
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